About Me

Hello! I’m Lingyu, a final-year CS Ph.D. student specializing in natural language processing, and I’m very fortunate to have Prof. Kevin Gimpel as my advisor. My primary focus is on text classification and generation, with an aim to identify and enhance the capabilities of the generative components of pretrained language models.

Expertise
  • Natural Language Processing
  • Deep Learning & Machine Learning
  • Data Analysis & Visualization
Education
  • Toyota Technological Institute at Chicago

    Ph.D. in Computer Science (ongoing), M.S. within Ph.D. in Computer Science

  • Tsinghua University

    M.E. in Electrical Engineering, B.E. in Electrical Engineering and Automation

Skills

Programming & Libraries

Python, PyTorch, TensorFlow, pandas, and more

Documentation Tools

MS Office, LaTeX, Version Control (Git)

Languages

Mandarin, English

Internship

 
 
 
 
 
Google LLC.
Research Intern
Google LLC.
May 2023 – August 2023 Mountain View, CA, USA

Target: Selecting Better In-Context Learning Demonstrations for Text Classification

Key Skills: TensorFlow, Pandas, Python, NumPy, LaTeX

Models: Flan-PaLM 2, off-the-shelf retriever (fine-tuned on mT5)

  • Completed over 100 pages of documentation and 4,000+ lines of code. Prepared a paper for submission.
  • Achieved a +2.6% gain on F1 macro scores over an already high baseline that matches or exceeds current benchmarks.
  • Proposed constraints for demonstration selection are potentially adaptable to other applications, including ranking.
 
 
 
 
 
TikTok Inc.
Research Intern
TikTok Inc.
May 2022 – August 2022 Remote

Target: Generating Questions of Different Styles Controlled with Keywords

Key Skills: PyTorch, PyTorch Lightning, Python, NumPy

Models: T5, mT5, ByT5

  • Authored over 3,600 lines of code.
  • Demonstrated that an enhanced T5 model with additional tokens, such as emojis, excels in generating keywords together with topics over other models, surpassing spaCy on keyword extraction.
  • Generated questions controlled with keywords, topics, and specified length. Determined that using distinct models yields better results for generating questions with different styles.
 
 
 
 
 
Educational Testing Service
Intern
Educational Testing Service
June 2021 – August 2021 Remote

Target: Generating and Ranking Inquisitive Questions Controlled with Question Types

Key Skills: PyTorch, Fairseq, Pandas, Python, NumPy, LaTeX

Models: RoBERTa, BART

  • Code is publicly available on GitHub (5000+ lines). Our paper was accepted for presentation at *SEM 2022.
  • Produced diverse questions tailored to specific question types.
  • Leveraged a pairwise ranker to select generated questions that matched the quality of human-crafted queries in terms of syntax, semantics, relevancy, and inquisitiveness, as validated by human assessment.

Awards & Honors

  • 2021: ETS Pre-Doctoral Fellowship
  • 2014: Mitsubishi Heavy Industries Scholarship, being one of 25 selected from approximately 180 candidates
  • 2013: NARI-RELAYS Scholarship, ranking in the approximate top 15%
  • 2011: 1st Grade Academic Excellence Scholarship, placing 6th out of 120 candidates
  • 2010: 2nd Grade Freshman Scholarship, ranking 2nd in the entire province

Teaching & Services

Teaching:

  • 2019: Teaching Assistant, Introduction to Machine Learning

Reviewer Services:

  • NAACL-HLT 2021
  • BEA 2022 & 2023
  • EMNLP 2022 & 2023
  • ACL 2023
  • TALLIP 2023 & 2024
  • ARR 2024
  • Secondary Reviewer: EMNLP 2019 & RepL4NLP 2020

Other Services & Activities:

  • 2023: Volunteer, Circle Cat (a non-profit organization)
  • 2020 - 2021: Student Member, TTIC Diversity, Equity, and Inclusion (DEI) Committee
  • 2020: Student Member, TTIC Ph.D. Admissions Committee
  • 2011: Teaching Volunteer, Mabian Yi Autonomous County, Sichuan, China
  • 2011: Member, Student Association for Science and Technology, EE Department at Tsinghua University