Sophia Yeeun Kang

I'm a recent graduate of Yale University with a B.S. degree in computer science and economics with an interest in speech processing and natural language processing. During my undergraduate studies, I had the opportunity to present my research in speech and NLP at Interspeech 2024 (Kos, Greece) and ACL 2024 (Bangkok, Thailand), supported by named grants and conference scholarships. I also worked as in the industry text-to-speech teams of NAVER and Samsung Electronics in South Korea. Outside of work, I enjoy studying foreign languages (I'm finally starting my fourth language in 2025!!) and exploring historic sites. I’m interested in themes of memorialization and selective preservation/destruction of material cultural heritage.

Post-graduation from Yale, I'll begin full-time work at Amazon as a Software Development Engineer in New York City.

My email is sophiayk20 [at] gmail [dot] com. My profiles are available at: Google Scholar, OpenReview, and ACL Anthology.

News

🎓 May 19, 2025
Obtained my B.S. degree in Computer Science and Economics from Yale University.

📄 May 15, 2025
🎉 My first co-authored paper was accepted to ACL 2025 Findings!
This paper holds special meaning to me as it was our team's joint research project at MBZUAI.

🌟 August 13, 2024
CoVoSwitch was selected as a Spotlight Paper at the ACL 2024 Student Research Workshop.
🧾 Gave a poster presentation on August 11 and an oral presentation on August 12!

📝 July 9, 2024
My first paper was accepted to the Student Research Workshop at ACL 2024.
✈️ I was also awarded a travel grant to attend ACL in person in Bangkok, Thailand.

🏅 June 26, 2024
Our team of 5 (from the U.S., China, India, and Vietnam) at MBZUAI UGRIP received the Best Team Award from Prof. Timothy Baldwin, Provost of MBZUAI.
🥇 Ranked 1st out of 9 ML/CV/NLP teams for research content on multilingual, multitask statement tuning of encoder models.

🗣️ June 6, 2024
Accepted to Interspeech YFRSW 2024!
🎤 Awarded a scholarship to attend and present my speech processing research at Interspeech 2024 in Kos, Greece.

Publications

*: Equal contribution

2025

Statement-Tuning Enables Efficient Cross-lingual Generalization in Encoder-only Models
Ahmed Elshabrawy, Thanh-Nhi Nguyen , Yeeun Kang *, Lihan Feng *, Annant Jain *, Faadil Abdullah Shaikh *, Jonibek Mansurov, Mohamed Fazli Mohamed Imam, Jesus-German Ortiz-Barajas, Rendi Chevi, Alham Fikri Aji.
*Findings of the Association for Computational Linguistics: ACL 2025, 2025
Paper

2024

CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units
Yeeun Kang.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 2024 [Oral, Spotlight Paper]
Paper and Code

Selected Projects

Visual Question Answering

Jan 2025 - May 2025 (@Yale University)

Analyzed performance of two visual question answering models, VILT (classification) and BLIP (generative) on Stanford's GQA (Grounded Question Answering) images dataset.

Topics: visual question answering

Multilingual, Multitask Statement Tuning for Encoder Models

May 2024 - Jun 2024 (@MBZUAI)

Created multilingual NLU datasets and made them available through HuggingFace. Evaluated zero-shot performance of encoder models on various NLU tasks. Published, ACL 2025 Findings.

Topics: encoder models, statement tuning, NLU

HuggingFace Org Repo, Final Presentation Slides, and Blog Post

Code-Switched Text Dataset Creation by Intonation Unit Detection and Replacement

Apr 2024 - Jun 2024 (Independent Project)

Work accepted at ACL-SRW 2024. Detected intonation unit boundaries of utterances in CoVoST 2 (speech-to-text translation dataset) with PSST, a pre-trained speech segmentation model from Whisper (STT), to create code-switched text leveraging prosodic features. Evaluated current SOTA NMT models' performance on 13 languages, including low-resource languages such as Welsh, Mongolian, and Tamil. I named my synthetic dataset CoVoSwitch.

Topics: prosodic speech segmentation, speech recognition (STT), neural machine translation (NMT), code-switching

Paper, Code, and CoVoSwitch on HuggingFace Datasets

Undisclosed Project using LLMs

Jan 2024 - Apr 2024 (@NAVER Cloud)

Intern project at NAVER Cloud.

Fine-tuning Whisper for Speech Recognition and Transcription

Aug 2023 - Dec 2023 (@Yale University)

Course final project for Yale's CPSC 488/588: AI Foundation Models. Took course with MS, PhD students and received full marks.

Code

Refining Custom Voice Metric

Jun 2023 - Aug 2023 (@Samsung Electronics)

Intern project at Samsung Electronics.

Topics: speech synthesis (TTS), mean opinion score (MOS), custom voice on Bixby

Blog Post

Teaching

I was involved in different computer science education initiatives as an undergrad at Yale.

I was a TA (Teaching Assistant) for the following courses:

CPSC 223: Data Structures and Programming Techniques (C, C++) [Jan 2023 - Dec 2023]
CS50: Introduction to Computing and Programming (C, Python, SQL, JavaScript) [Aug 2022 - Dec 2023]

and a mentor for:

Code Haven [Sep. 2021 - May 2022]
- Taught middle school students in New Haven, Connecticut how to code in Scratch.

Other Fun Things

I participated in HackMIT 2022. Along with 3 other team members I met on the site (2 others from the US and 1 from Canada), we were awarded as finalists at the hackathon held in Cambridge, Massachusetts.

I was also at the CS50 Hackathon at Harvard in Fall 2022, where I pulled an all-nighter(!!) helping Harvard and Yale students with their creative projects.

Random Fact

My favorite course at Yale was HSAR 247: Art and Myth in Greek Antiquity (Fall 2022), where I wrote a course final paper on the Parthenon as a palimpsest. This course met in the Yale University Art Gallery and motivated me to visit the Acropolis Museum in Athens in 2024 and the British Museum in London in 2025.

Last updated

I last updated this page on May 16, 2025.