Lihao Sun

Hi there! I recently graduated from the University of Chicago, with a B.S. in Computer Science and a B.A. (Honors) in Cognitive Science.

I seek to understand intelligence better. My concrete research interests center on understanding why and how LLMs demonstrate impressive capabilities and, at times, undesirable or unsafe behaviors. By applying and developing mechanistic interpretability tools, I aim to illuminate mechanisms that drive these outcomes and to design principled methods that can reliably reshape a model’s internals and outputs. Ultimately, I hope this line of inquiry helps us reflect more deeply on what it means to learn, think, and be human.

During my undergraduate years, I was fortunate to collaborate with Prof.Xuechunzi Bai, Prof. Chengzhi Mao, Dr. Andrew Lee, and Dr. Valentin Hofmann.

I’m also an indie music enthusiast, music magazine writer, startup builder, and competition math specialist. You can learn more about my life in this tab.

Email | Google Scholar | Github | X | Bluesky

news

Jul 19, 2025	I will be attending and presenting at ACL 2025 in Vienna, Austria from Jul 25 to Aug 3.
Jun 10, 2025	I will be attending Y Combinator AI Startup School on Jun 16-17. See you in San Francisco!

publications

ACL (Main)

Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race

Lihao Sun , Chengzhi Mao , Valentin Hofmann , and Xuechunzi Bai

ACL (Main), 2025

arXiv Code Website
Preprint

The Geometry of Self-Verification in a Task-Specific Reasoning Model

Andrew Lee , Lihao Sun , Chris Wendler , Fernanda Viegas , and Martin Wattenberg

NeurIPS (Mech Interp Workshop), 2025

arXiv