I am an undergraduate student at Yale University double majoring in computer science and economics (expected May 2025). My broad research interests lie in multilingual speech processing and natural language processing, as well as multimodal translation. In industry, I worked as an AI research engineer intern at the text-to-speech (TTS) teams at NAVER Cloud for natural language processing and Samsung Electronics for acoustic processing in Seongnam and Suwon, South Korea. In academia, I worked as a research intern in Professor Alham Fikri Aji's lab at the natural language processing department of MBZUAI in Abu Dhabi, United Arab Emirates.


I currently speak 3 languages - English, Korean, and Spanish - but hope to become a more multilingual polyglot in the future! (Studying French has always been a long time wish of mine..) I enjoy learning languages because it opens me up to different cultures and perspectives.

My email is sophiayk20 [at] gmail [dot] com. My profiles will be available at: Google Scholar, OpenReview, and ACL Anthology.

News

  • [8/13/2024]: CoVoSwitch was selected as a Spotlight Paper at ACL 2024 Student Research Workshop. I did a poster presentation on August 11th and oral presentation on August 12th!

  • [7/9/2024]: My first paper was accepted to the Student Research Workshop at ACL 2024. I was also offered a travel grant for attending ACL in person. Beyond excited to be in Bangkok, Thailand!

  • [6/26/2024]: 4 other team members (from the U.S., China, India, and Vietnam) and I at MBZUAI UGRIP received the best team award from Professor Timothy Baldwin, Provost of MBZUAI. We ranked first out of 9 ML/CV/NLP teams for research content of our project on multilingual, multitask statement tuning of encoder models.

  • [6/6/2024]: I was accepted to Interspeech YFRSW 2024. I was offered a scholarship to attend and present my speech processing research at Interspeech 2024 in Kos, Greece.

Publications

2024

  • CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units
    Yeeun Kang.
    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 2024
    Paper and Code

Selected Projects

  • Multilingual, Multitask Statement Tuning for Encoder Models
    May 2024 - Jun 2024 (@MBZUAI)
    Created multilingual NLU datasets and made them available through HuggingFace. Evaluated zero-shot performance of encoder models on various NLU tasks.
    Topics: encoder models, statement tuning, NLU
    [HuggingFace Org Repo] , [Final Presentation Slides], and [Blog Post]

  • Code-Switched Text Dataset Creation by Intonation Unit Detection and Replacement
    Apr 2024 - Jun 2024 (Independent Project)
    Work accepted at ACL-SRW 2024.
    Detected intonation unit boundaries of utterances in CoVoST 2 (speech-to-text translation dataset) with PSST, a pre-trained speech segmentation model from Whisper (STT), to create code-switched text leveraging prosodic features. Evaluated current SOTA NMT models’ performance on 13 languages, including low-resource languages such as Welsh, Mongolian, and Tamil. I named my synthetic dataset CoVoSwitch.
    Topics: prosodic speech segmentation, speech recognition (STT), neural machine translation (NMT), code-switching
    [Paper], [Code], and [CoVoSwitch on HuggingFace Datasets]

  • Undisclosed Project using LLMs
    Jan 2024 - Apr 2024 (@NAVER Cloud)
    Intern project at NAVER Cloud.
    Paper TBD.

  • Fine-tuning Whisper for Speech Recognition and Transcription
    Aug 2023 - Dec 2023 (@Yale University)
    Course final project for Yale’s CPSC 488/588: AI Foundation Models. Took course with MS, PhD students and received full marks.
    [Code]

  • Refining Custom Voice Metric
    Jun 2023 - Aug 2023 (@Samsung Electronics)
    Intern project at Samsung Electronics.
    Topics: speech synthesis (TTS), mean opinion score (MOS), custom voice on Bixby
    [Blog Post]

Teaching

I was involved in different computer science education initiatives as an undergrad at Yale.

I was a TA (Teaching Assistant) for the following courses:

  • CPSC 223: Data Structures and Programming Techniques (C, C++) [Jan 2023 - Dec 2023]
  • CS50: Introduction to Computing and Programming (C, Python, SQL, JavaScript) [Aug 2022 - Dec 2023]

and a mentor for:

  • Code Haven [Sep. 2021 - May 2022]
    • Taught middle school students in New Haven, Connecticut how to code in Scratch.

Other Fun Things

I participated in HackMIT 2022. Along with 3 other team members I met on the site (2 others from the US and 1 from Canada), we were awarded as finalists at the hackathon held in Cambridge, Massachusetts.

I was also at the CS50 Hackathon at Harvard in Fall 2022, where I pulled an all-nighter(!!) helping Harvard and Yale students with their creative projects.

Last updated

I last updated this page on Aug 15, 2024.