Zerui Cheng (程泽瑞)

Zerui Cheng (程泽瑞)

Ph.D. Candidate at Princeton Univ., AI Researcher

Princeton University

Evaluating AI. Generating Data. Scaling Intelligence.

Hi there, I’m Zerui Cheng (程泽瑞), a Ph.D. candidate at Princeton University advised by Prof. Pramod Viswanath. I was a student researcher at ByteDance Seed (contributor to Seed 2.0 Pro) and Tencent Hunyuan (contributor to Hy3 Preview). Before Princeton, I completed my B.Eng. in Computer Science from Yao Class at Tsinghua University, graduating summa cum laude and receiving the prestigious Yao Award.

My research focuses on Evaluation of LLMs and Agents and Synthetic Data, two areas that I believe are foundational to the goal of self-evolving agents for long-horizon tasks.

Evaluation of LLMs and Agents: Measuring What Matters

LLM evaluation is critical for two reasons. First, for frontier AI labs, evaluation defines and formalizes the tasks, goals, and benchmarks that guide the direction of AI progress. Second, for the broader public, it helps users understand which models are best suited for different use cases. At its core, the philosophy of LLM evaluation concerns how we define tasks, goals, and rubrics, as well as how we ensure that evaluation results and leaderboards are unbiased, robust, and trustworthy.

Synthetic Data: Fuel for Self-Improving AI

Synthetic data is equally important because human expertise is ultimately scarce, difficult to scale, and unable to keep pace with the rapid development of AI. For AI systems to continue improving in the future, they must increasingly learn to evolve themselves. The most foundational step toward this vision is providing AI with the “fuel” it needs to progress: data. In the long run, this data should not only be consumed by AI, but also proposed, generated, and refined by AI itself.

Research Impact

My research has been featured in venues including Nature, NeurIPS, ICLR, ICML, COLM, AAAI, ACM CCS, EuroSys, and IEEE Transactions on Networking, and has contributed to the technical whitepapers of high-profile startups including Sentient, Kite AI, and PolyHedra.

Beyond the Lab

Beyond research, I am a member of the Competitive Programming Hall of Fame. I served as the President of the Yao Class Students’ Congress during my undergraduate years, and I was once a contestant on the TV show Super Brain (江苏卫视《最强大脑》第10季).

Let’s Collaborate

I’m always open to research and industry collaborations, especially around LLM evaluation, synthetic data, and frontier AI systems. Feel free to contact me and chat!

Google Scholar profile       Curriculum Vitae

Interests
  • Evaluation of LLMs and Agents
  • Synthetic Data
  • Decentralized AI Systems
  • Blockchain & Cryptography
Education
  • Ph.D. student (2023 - now)

    Electrical and Computer Engineering, Princeton University

  • B.Eng. in Computer Science (2019 - 2023)

    Yao Class, the Insititute for Interdisciplinary Information Sciences (IIIS), Tsinghua University

Recent Highlights

[May 2026] (paper acceptance)

Two papers (FrontierCS, the Generalization Spectrum) accepted to ICML 2026!

One paper (ValueMine) accepted to the journal IEEE Transactions on Networking!

[Feb 2026] (new papers)

Two first-authored papers done at ByteDance Seed are online now!

[Jan 2026] (paper acceptance)

Three papers are accepted in various venues this month!

  • One paper (HLE) accepted to Nature!

  • One paper (TAO) accepted to EuroSys 2026!

  • One paper (AutoCode) accepted to ICLR 2026!

[Dec 2025] (talk)
  • Dec 4: Gave a talk on “Open-Source AI for Competitive Programming” at the OpenAGI Symposium at NeurIPS! Ticket here
[Dec 2025] (paper acceptance)

CAIA gets accepted and selected for oral presentation (top 10%) to AAAI 2026 AI4Finance!

[Sep 2025] (paper acceptance)

Two papers (LiveCodeBench Pro, PeerBench) accepted to NeurIPS 2025!

Papers

For most recent updates, please refer to my Google Scholar profile. Here are some selected publications.

High-Real-Value Technical Whitepapers for Superstar Startups

  • OML: Open, Monetizable, Loyal AI (2024, NeurIPS 2025 Lock-LLM)

  • zkBridge (ACM CCS 2022)

    • Trustless cross-chain bridges using zero-knowledge proofs
    • Foundation for the blockchain startup Polyhedra Network (valued at $1 billion by the end of 2024)
  • Kite AI Whitepaper

    • Revolutionary infrastructure design for a stablecoin payment network dedicated for AI agents
    • The technical whitepaper of Kite AI, a blockchain payment startup which secured $33M funding led by PayPal Ventures in seed ($15M) and series A ($18M) combined.

(Selected) Research in Industry Grounded in Real Practice

  • VeRA: Verified Reasoning Data Augmentation at Scale

    • Done at ByteDance Seed team, with the research question originated from the real practice of building a frontier large language model (i.e. Seed 2.0 Pro)
    • It demonstrates a new way of generating high-quality reasoning data without bothering human expertise which is usually scarce and expensive.
  • CAIA: Crypto AI Agent Benchmark

    • Take the advisory role for the great Surf AI team, Cybertino Labs, which secured $15M funding in their seed round.
    • The paper builds the first ever benchmark for AI agents dedicated for crypto, and lays the foundation for the entire Surf AI agentic ecosystem.

(Selected) Publications in Academia with High Impact

Other Publications with One-sentence Description

  • LLM and Agent Evaluation
  • FrontierCS (ICML 2026): An evolving benchmark for evolving intelligence on open problems in computer science;
  • FutureX Pro: Done at ByteDance Seed; An agent benchmark for real-life future prediction in various high-value domains;
  • PeerBench (also part of Decentralized AI, NeurIPS 2025): A new paradigm on how we fairly evaluate LLM and agents in a robust and reliable way;
  • SPIN-Bench (COLM 2025): A benchmark on LLM’s long-horizon reasoning and planning abilities.
  • Synthetic Data Generation
  • AutoCode (ICLR 2026): An agentic framework for generating tests on competitive programming problems to scale training and evaluation in coding;
  • TabularMath : Done at ByteDance Seed; A framework for generating high-quality tabular datasets for tabular foundation models.
  • Decentralized AI
  • TAO (EuroSys 2026): Verifiable and reproducible LLM inference results to ensure accountability in MLaaS.
  • Sakshi: A roadmap for ideal decentralized AI platform where every step is transparent and auditable, ensuring AI benefits the humanity at the end of the day.
  • PoCW (IEEE Transactions in Networks): A paradigm for making Proof-of-Work in blockchains useful (e.g. for model training, inference, etc.) to avoid the huge waste in computation power caused by cryptocurrencies.