Lingyu Gao

Lingyu Gao

AI Research Engineer

Duolingo

About Me

Hello! I’m Lingyu, an AI Research Engineer at Duolingo. I hold a Ph.D. in Computer Science from Toyota Technological Institute at Chicago, where I worked with Prof. Kevin Gimpel on Natural Language Processing.

My work focuses on LLM-based systems for educational and multilingual applications, especially text retrieval, generation, and evaluation. I am interested in turning ambiguous language problems into measurable, scalable AI systems that connect research with real-world product needs.

Expertise
  • Large Language Models
  • Educational & Multilingual AI
  • Text Retrieval, Generation & Evaluation
Education
  • Toyota Technological Institute at Chicago

    Ph.D. in Computer Science, M.S. within Ph.D. in Computer Science

  • Tsinghua University

    M.E. in Electrical Engineering, B.E. in Electrical Engineering and Automation

Skills

Programming & Data

Python, PyTorch, TensorFlow, SQL, pandas, NumPy

Infrastructure & Tools

Airflow, BigQuery, AWS, Pinecone, Git, LaTeX

Languages

Mandarin, English

Work Experience & Internships

 
 
 
 
 
Duolingo
AI Research Engineer
Duolingo
January 2025 – Present Pittsburgh, PA, USA
  • Building LLM-based systems for educational and multilingual applications, with a focus on retrieval, generation, and evaluation.
  • Designing model workflows and evaluation methods that bridge research prototypes and production use cases.
 
 
 
 
 
Educational Testing Service
AI Engineer
Educational Testing Service
July 2024 – December 2024 Princeton, NJ, USA
  • Fine-tuned LLMs with domain-specific data using supervised fine-tuning and preference optimization.
  • Collaborated with test developers, engineers, and scientists on high-quality test item generation.
  • Developed multi-agent LLM approaches for automatic scoring.
 
 
 
 
 
Google LLC.
Research Intern
Google LLC.
May 2023 – August 2023 Mountain View, CA, USA
  • Developed demonstration selection methods for retrieval-based in-context learning.
  • Improved text classification performance over a strong retrieval baseline and outperformed fine-tuned models on several datasets.
  • This work led to a related arXiv preprint.
 
 
 
 
 
TikTok Inc.
Research Intern
TikTok Inc.
May 2022 – August 2022 Remote
  • Fine-tuned style-specific T5 models for controllable question generation with keyword, topic, and length constraints.
  • Improved macro-F1 by 21% over a keyword-extraction baseline.
 
 
 
 
 
Educational Testing Service
Intern
Educational Testing Service
June 2021 – August 2021 Remote
  • Fine-tuned models for inquisitive question generation and built a pairwise ranker to select high-quality questions.
  • Released an open-source codebase and published the work at *SEM 2022.

Publications

(2026). WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification. arXiv preprint.

Cite

(2023). Ambiguity-Aware In-context Learning with Large Language Models. arXiv preprint.

Cite

Awards & Honors

  • 2021: ETS Pre-Doctoral Fellowship
  • 2014: Mitsubishi Heavy Industries Scholarship, being one of 25 selected from approximately 180 candidates
  • 2013: NARI-RELAYS Scholarship, ranking in the approximate top 15%
  • 2011: 1st Grade Academic Excellence Scholarship, placing 6th out of 120 candidates
  • 2010: 2nd Grade Freshman Scholarship, ranking 2nd in the entire province

Teaching & Services

Teaching:

  • Teaching Assistant, Introduction to Machine Learning (2019)

Reviewer Services:

  • Reviewer for ARR, ACL, EMNLP, NAACL-HLT, COLM, CoNLL, BEA, TALLIP, and FigLang
  • Secondary Reviewer for EMNLP 2019 and RepL4NLP 2020

Other Services & Activities:

  • Research Volunteer, Circle Cat (2023-present)
  • Student Member, TTIC DEI Committee and Ph.D. Admissions Committee (2020-2021)