Pronouns: she/her
My name in Chinese: 施 惟佳
Email: [email protected]
<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" width="40px" /> Github
</aside>
<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/f30b1dd3-7384-4ae2-b912-11f36f7e174f/Picture2.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/f30b1dd3-7384-4ae2-b912-11f36f7e174f/Picture2.png" width="40px" /> Google Scholar
</aside>
<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/4a681ff9-9560-425e-a09b-83cefa5ab4e8/twitter-3.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/4a681ff9-9560-425e-a09b-83cefa5ab4e8/twitter-3.png" width="40px" /> Twitter
</aside>
I am Weijia Shi, a PhD student in Computer Science at the University of Washington advised by Luke Zettlemoyer and Noah A. Smith. I was a visiting researcher at Meta AI, working with Scott Yih. Prior to UW, I graduated from UCLA with a B.S. in Computer Science and Minor in Math.
My research focuses on natural language processing and machine learning. I am particularly interested in ***LM pretraining*** and retrieval-augmented models. I also study multimodal reasoning and investigates ***copyright*** and privacy risks ******associated with LMs.
<aside> 🌱 What’s NEW
☑️ Honored to be selected as 2023 Machine Learning Rising Star and 2024 Data Science Rising Stars ☑️ Paper (Don't Hallucinate, Abstain) Win ACL 🏆 Outstanding Paper Award.
☑️ Office hours: Starting November 2023, I will be holding office hours (1~2 hours a week) dedicated to offering mentorship and advice to undergraduate/master students. If you want to chat about research and grad school application, please fill out the form
</aside>
Please see my Google Scholar or Semantic Scholar profiles for the full list.
(*: equal contribution)
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
*Weijia Shi, *Jaechan Lee, *Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He*, Yangsibo Huang*, Weijia Shi*, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson
Evaluating Copyright Takedown Methods for Language Models
*Boyi Wei, *Weijia Shi, *Yangsibo Huang, Noah A Smith, Chiyuan Zhang, Luke Zettlemoyer, Kai Li, Peter Henderson
NeurIPS 2024. [paper][website][code]
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu*, Weijia Shi*, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna
NeurIPS 2024. [paper][website][code]
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
Shangbin Feng, Weijia Shi, Yike Wang, Wenxuan Ding, Vidhisha Balachandran, Yulia Tsvetkov
ACL 🏆 Outstanding Paper Award. 2024**.** [paper][code]
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Shangbin Feng, Weijia Shi, Yuyang Bai, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov.
ICLR Oral. 2024. [paper][code]
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis
ICLR Spotlight. 2024. [paper][code]
Detecting Pretraining Data from Large Language Models
Weijia Shi**,* Anirudh Ajith*, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer
ICLR. 2024. [paper] [website][code]
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min*, Suchin Gururangan*, Eric Wallace, Weijia Shi, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer.
ICLR Spotlight. 2024. [paper][code]
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding.
Weijia Shi,* Xiaochuang Han*, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih.
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Scott Wen- tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu
ACL, 2023. [paper] [website][model (🌟 4M downloads on HuggingFace)]
Toward Human Readable Prompt Tuning: Kubrick’s The Shining is a good movie, and a good prompt too?
Weijia Shi*, Xiaochuang Han*, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, Luke Zettlemoyer
EMNLP, 2023. [paper]
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu*, Yushi Hu*, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
NeurIPS Spotlight, 2023. [paper] [website][code]
kNN-Prompt: Nearest neighbor zero-shot inference.
Weijia Shi, Julian Michael, Suchin Gururangan, Luke Zettlemoyer
2024/12: Princeton Language and Intelligence Center
Title: Language Model and their Data: What Matters Beyond Scale?
2024/03: Meta AI, AI reading group
Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2024/02: Google Research
Title: Detecting Pretraining Data from Large Language Models
2024/01: Google, NLP reading group
Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2024/02: Cohere
Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2023/12: KAIST, IBS Data Science Group
Title: Detecting Pretraining Data from Large Language Model
2023/03: Microsoft Cognitive Service Research Group
Title: REPLUG: Retrieval-Augmented Black-Box Language Models
University of Washington, 09/2020–Present
Ph.D. student, supervised by Luke Zettlemoyer and Noah A. Smith
Meta AI, 06/2022–09/2024
Visiting Researcher, supervised by Scott Yih
University of Pennsylvania, 05/2019–09/2019
Research Intern, supervised by Dan Roth
UCLA, 04/2018–06/2020
Research assistant, supervised by Kai-Wei Chang and Adnan Darwiche