Sang Yun Kwon

I'm a first year Ph.D. student at University of British Columbia (UBC), School of Information (iSchool) , working with Prof. Muhammad Abdul-Mageed . I am a member of UBC Deep Learning and NLP Lab and UBC NLP Group . I got my MLIS from UBC ischool, where i was advised by Prof. Muhammad Abdul-Mageed and Prof. Victoria Lemieux . I did my Bachelor degree at Yonsei University, Seoul, South Korea.

Email / CV / Scholar / Twitter / LinkedIn /

profile photo

Research

My research interests lie in natural language processing and machine learning, with a focus on cultural representation and inclusivity in large language models. I am particularly interested in developing technologies that serve underrepresented language communities and bridge cross-cultural gaps in NLP systems.

News

  • We have a paper accepted to EMNLP 2025.
  • We have a paper accepted to NAACL 2025.
  • We have a paper accepted to EMNLP 2024.
  • We have a paper accepted to ArabicNLP 2024.
  • I will join UBC as a new Ph.D. student in Fall 2024.

Publications

Voice of a Continent: Mapping Africa's Speech Technology Frontier
AbdelRahim Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed
EMNLP, 2025
Paper

A comprehensive survey mapping the current state of speech technology for African languages.

JAWAHER: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking
Sang Yun Kwon, Samar Magdy, Fakhraddin Alwajih, Safaa Taher Abdelfadil, Shady Shehata, Muhammad Abdul-Mageed
NAACL, 2025
Paper

A multidialectal dataset of Arabic proverbs for LLM Benchmarking.

Gazelle: An Instruction Dataset for Arabic Writing Assistance
Samar Magdy, Sang Yun Kwon, Fakhraddin Alwajih, Reem Abdel-Salam, Muhammad Abdul-Mageed
EMNLP, 2024 | Oral Presentation
Paper

A comprehensive dataset and evaluation framework for Arabic writing assistance.

Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction
Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed
ArabicNLP co-located with ACL, 2024 | Oral Presentation
Paper

Exploring the performance of instruction-finetuned large language models (LLMs) in Arabic Grammatical Error Correction.

SIDLR: Slot and Intent Detection Models for Low-Resource Language Varieties
Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed
VarDial workshop co-located with EACL , 2023
Paper

Shared task on slot and intent detection for low-resource language varieties.


Design and source code from John Barron's website.