Zerui Cheng (程泽瑞)

Welcome to Zerui (Marco) Cheng’s homepage~

The pronunciation of “Zerui” is similar to “the ray”. You can also call me Marco instead :)

  • SHORT BIOGRAPHY

    I’m a 4th-year undergraduate in the Honored Yao Class of IIIS, Tsinghua University, and the recipient of the highest honor Yao Award for my undergraduate achievements. I am a curious, proactive, and aspirant student with a broad range of research interests, especially the techniques and theories that can potentially have fantastic real-world impacts. My current research lies in an intersection of blockchain, applied cryptography, data processing and analysis, and mechanism design. I’m also an experienced competitive programmer with notable achievements.

    Check my CV here for more information :).

  • RECENT HIGHLIGHTS

    • [Jan 2023] I‘m fortunate to be selected as a participant in this year’s TV program <Super Brain: Season 10> on Jiangsu channel (broadcast at 21:10 every Friday, Beijing time). Stay tuned for my performance!

    • [Nov 2022] After serving as the vice president of IIIS Students' Congress for a year, I was honored to receive the most votes in the election and become a member of the newly-elected Presidium of IIIS Students' Congress!

    • [Oct 2022] I felt honored to receive this year’s Scholarship for Comprehensive Excellence at Tsinghua University!

    • [Sep 2022] I felt honored and extremely excited to receive this year’s Yao Award for my undergraduate achievements!

    • [Aug 2022] Our paper “zkBridge: Trustless Cross-chain Bridges Made Practical” was accepted to ACM CCS 2022!

    • [Aug 2022] I participated in IC3 Blockchain Camp, and our group “Client Security” won 1st place in the Hackathon!

    • [Jun 2022] I participated in Google Code Jam 2022 and was happy to finish in the top 0.5% among all participants worldwide.

    • [May 2022] I served as the reviewer of 2 papers in IEEE JSAC Blockchain Special Issue.

  • SCIENTIFIC RESEARCH

    • My research interests include blockchain, mechanism and incentive design, and applied cryptography.

    • Since sophomore year, I have been advised by Prof. Zhixuan Fang to conduct scientific research on blockchain at Tsinghua University, and it’s the initiation and enlightenment of my stepping into scientific research. During this period, I completed my first academic research paper (on a framework linking crowdsourcing with Proof-of-Work to serve as a new Proof-of-Useful-Work scheme) .

    • I was fortunate to be invited to serve as a reviewer for IEEE JSAC Special Issue (Intelligent Blockchain) 2022.

    • I was fortunate to be accepted as a research intern on blockchain in the spring and summer of 2022, advised by Prof. Fan Zhang at Duke University and Prof. Dawn Song at the University of California, Berkeley. I carried out research on two projects during the internship, one was the analysis and simulation of side contracts against EIP-1559 in Ethereum, and the other was the design and implementation of cross-chain bridge protocols (the paper was accepted to ACM CCS 2022 and it’s my first-ever publication).

  • COMPETITIVE PROGRAMMING

    • I am also an experienced and proud competitive programmer (What is competitive programming?), where I regard randomized algorithms and optimization of search algorithms as my most proficient skills.

    • For offline contests, I have won a Gold Medal in ICPC (International Collegiate Programming Contest) Regional, a Silver Medal in ICPC East Asia Continental Final of, and two Gold Medals in CCPC (Chinese Collegiate Programming Contest). I’m also a two-time silver medalist of the Chinese National Olympiad in Informatics (both in 2017 and 2018).

    • For online contests, I rank 16/182781 worldwide (and 2nd in China) on the well-reputed online programming platform Hackerrank, and you can access the leaderboard for more details. I also compete on Codeforces (see my profile here) and CSAcademy (see my profile here).

    • Besides, I’m also an experienced problem setter for competitive programming contests. I have set and tested a number of problems for various contests. For example, I’m the main author of Codeforces Round #447 (problemset here) and a problem setter of the Chinese National Olympiad in Informatics in 2021.

  • SOCIAL WORK

    • I’m now a member of the Presidium of IIIS Yao Class Students' Congress (a superset of Students' Union) at Tsinghua University, where I served as the vice president from Oct 2021 to Nov 2022.

    • I’m a member of the Algorithm Association at Tsinghua University. And I served as the problem setter and coordinator of a number of contests (including National Olympiad in Informatics (2021), Tsinghua University Programming Contest (2020, 2021, 2022)).

    • I was a volunteer for the 110th anniversary of Tsinghua University.

  • AVOCATION

    • Fan of NBA, Formula 1, and European Soccer. My favorite sporting stars include Stephen Curry (Golden State Warriors, NBA), Lewis Hamilton (Mercedes, F1), Charles Leclerc (Ferrari, F1) and Marco Reus (BVB Dortmund, Bundesliga).

    • Fan of stand-up comedies (including both Chinese-style cross-talk and American-style talk shows) and musicals. My favorite stars include Guo Degang and Hu Lan (Hooligan). Favorite musicals include “Hamilton” and “Les Miserables”.

    • Fan of music, especially those in musicals or in “Gu-Feng” (ancient Chinese) style.

    • Amateur but enthusiastic basketball player, although I don’t have strong skills :(

  • Last revised on: Jan 11, 2023
[Oct 2025] (papers and acceptance notifications)

Several papers that I contributed to are online now, and will be presented in different venues in the near future!

First-author papers:

  • Paper 1: PeerBench Paradigm: We analyze the systematic challenges faced by AI benchmark paradigm today—data contamination, collusion, overfitting, etc, and propose PeerBench, a novel mechanism based on community contribution and reputation to reliably and efficiently measure data quality and build fair leaderboards. Our vision is to return AI evaluation to its role as a public good, aligning tech development with all humanity’s needs, not just those of a few giants. [Accepted to NeurIPS 2025]
  • Paper 2: OML Primitive: Are openness and commercial value mutually exclusive? In this paper, we go one step further than the original OML whitepaper of Sentient last year. We formalize the OML framework, exploring a path where models are open-access, but technical safeguards prevent misuse. OML offers a blueprint for sustainable, open AI governance and operation of next-gen AI. [Accepted to NeurIPS 2025 Lock-LLM]

Co-first-author papers:

  • Paper 3: CAIA (Crypto AI Agent Benchmark): The first AI agent benchmark in crypto and web3. Our results show that models aren’t yet reliable in this high-stakes, high-misinformation adversarial domain, and there is a giant gap to an ideal world where we can let AI reliably control users' wallets and manage real funds without risks. [Accepted to ICAIF 2025 AI4F (poster), AI-R2D2 (oral)]
  • Paper 4: AutoCode: The follow-up work to LiveCodeBench Pro crafted by the LiveCodeBench Pro dream team. We create a robust and efficient AI system that auto-generates coding problems to solve the data scarcity bottleneck.

Co-author papers:

  • Paper 5: NAO (Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks): A crucial step for Decentralized AI, ensuring that AI inference results are reproducible and verifiable, so that the rights of end users are protected. It solves the bottleneck that we encountered in our Sakshi paper 2 years ago, and is a critical step for realizing our vision of a decentralized AI platform.
  • Paper 6: Kite AI Whitepaper: I co-authored the whitepaper for Kite AI as a research collaborator. Kite AI is building a native payment infrastructure for AI agents, and we depict the vision where agents transact autonomously with cryptographic accountability and traceability. Kite AI has raised $33 million from top–tier investors, including PayPal, General Catalyst, Coinbase Venture and leading blockchain foundations.

Among those, PeerBench and LiveCodeBench Pro will be presented at NeurIPS 2025 Main Conference in San Diego on Dec 3; CAIA will be presented at ICAIF 2025 in Singapore (AI4F on Nov 15, AI-R2D2 on Nov 16); and OML Primitive will be presented at NeurIPS 2025 Lock-LLM on Dec 6. Stay tuned for them!

[Oct 2025] (talk)
[Aug 2025] (talks)
[Jun 2025] (papers)

The AI benchmark paper LiveCodeBench Pro that I co-first-authored is online now!

  • LiveCodeBench Pro: We collaborated with elite competitive programmers to launch a continuously updated benchmark, precisely evaluating model capabilities on dynamic, high-difficulty coding tasks. The paper has been covered by MIT Tech Review and has already accumulated more than 1 million views on X [Accepted to NeurIPS 2025];
[Jun 2025] (talk)
[May 2025] (personal update)
  • Passed my Ph.D. general exam. I’m officially a Ph.D. candidate now!
  • Thank you to all my committee members: Prof. Chi Jin, Prof. Sanjeev Kulkarni, and Prof. Pramod Viswanath!
[Apr 2025] (poster presentation)
  • Poster presentation on OML at Citadel Securities PhD Summit 2025. Thank you Citadel Securities!
[Mar 2025] (papers)

Two AI benchmark papers that I co-authored are online now!

[Sep 2024] (paper)
  • The whitepaper on OML: Open, Monetizable and Loyal AI is live. Don’t hesitate to check it out!
  • Here is the link to whitepaper.
[Aug 2024] (personal update)
  • Started my one-month internship as a Quantitative Researcher at JQ Investment.
[May 2024] (personal update)
  • Started my internship as an AI fellow at Sentient.

Evaluation of LLMs and Agents: Measuring What Matters

LLM evaluation is critical for two reasons. First, for frontier AI labs, evaluation defines and formalizes the tasks, goals, and benchmarks that guide the direction of AI progress. Second, for the broader public, it helps users understand which models are best suited for different use cases. At its core, the philosophy of LLM evaluation concerns how we define tasks, goals, and rubrics, as well as how we ensure that evaluation results and leaderboards are unbiased, robust, and trustworthy.

Synthetic Data: Fuel for Self-Improving AI

Synthetic data is equally important because human expertise is ultimately scarce, difficult to scale, and unable to keep pace with the rapid development of AI. For AI systems to continue improving in the future, they must increasingly learn to evolve themselves. The most foundational step toward this vision is providing AI with the “fuel” it needs to progress: data. In the long run, this data should not only be consumed by AI, but also proposed, generated, and refined by AI itself.

Zerui Cheng (程泽瑞)
Zerui Cheng (程泽瑞)
Ph.D. Candidate at Princeton Univ., LLM and AI Researcher