I am Zerui Cheng (程泽瑞), a Ph.D. candidate at Princeton University advised by Prof. Pramod Viswanath. My research focuses on Evaluation of LLMs and Agents and Synthetic Data, two pillars for building self-evolving agents for long-horizon tasks.
I’m a Quant Research Intern at Citadel Securities, and previously a Student Researcher at ByteDance Seed and Tencent Hunyuan, contributing to Seed 2.0 Pro and Hy3 Preview. Before Princeton, I received my B.Eng. in Computer Science from the Yao Class at Tsinghua University, graduating summa cum laude and receiving the Yao Award.
My research has been published in Nature and leading venues including NeurIPS, ICLR, ICML, COLM, AAAI, ACM CCS, EuroSys, and IEEE Transactions on Networking. I am also a core contributor to the technical whitepapers for Sentient, Kite AI, and PolyHedra.
My work has been covered by MIT Technology Review (on AI evaluation crisis) and Sciences et Avenir (on the philosophy of AI evaluation). Beyond research, I am a member of the Competitive Programming Hall of Fame, a contestant on TV Show Super Brain Season 10, and previously served as President of the Yao Class Students' Congress.
Ph.D. student (2023 - now)
Electrical and Computer Engineering, Princeton University
B.Eng. in Computer Science (2019 - 2023)
Yao Class, the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
I’m honored to be interviewed by Sciences et Avenir, the leading popular science magazine in France, which later featured PeerBench and my broader views on the philosophy of LLM evaluation in the article Que valent les comparateurs d’IA ? in June 2026.
Two papers (FrontierCS, the Generalization Spectrum) accepted to ICML 2026!
One paper (ValueMine) accepted to the journal IEEE Transactions on Networking!
Two first-authored papers done at ByteDance Seed are online now!
Three papers are accepted in various venues this month!
One paper (HLE) accepted to Nature!
One paper (TAO) accepted to EuroSys 2026!
One paper (AutoCode) accepted to ICLR 2026!
CAIA gets accepted and selected for oral presentation (top 10%) to AAAI 2026 AI4Finance!
Two papers (LiveCodeBench Pro, PeerBench) accepted to NeurIPS 2025!
LiveCodeBench Pro is covered by MIT Technology Review in their article Can we fix AI’s evaluation crisis?.
For most recent updates, please refer to my Google Scholar profile. Here are some selected publications.
OML: Open, Monetizable, Loyal AI (2024, NeurIPS 2025 Lock-LLM)
zkBridge (ACM CCS 2022)
VeRA: Verified Reasoning Data Augmentation at Scale
CAIA: Crypto AI Agent Benchmark
LiveCodeBench Pro (NeurIPS 2025) - Comprehensive, hard, and contamination-free code generation benchmark
Humanity’s Last Exam (2025) - Ultimate test for AI capabilities
LLM and Agent Evaluation
Synthetic Data Generation
Decentralized AI