Machine Learning Systems • AI Research

Zhengxu (Jason) Yan

M.S. student at Stanford building feedback-driven learning systems, verifiable code benchmarks, and research pipelines for scientific discovery.

My recent work spans scalable model improvement, structured critique, diffusion transformer serving, and dataset intelligence for NLP. I’m interested in turning ambitious ML ideas into robust systems and clear empirical results.

View CV Publications Google Scholar About Email

Stanford, CA Research: LLMs, ML systems, RL

Current Focus

Scaling feedback descent and iterative refinement
Verifiable code generation and benchmark design
Dataset metadata and claim extraction for NLP papers

Selected Work

Research highlights

Representative papers and projects that best capture the direction of my recent research.

Accepted at ICLR 2026

VERINA: Benchmarking Verifiable Code Generation

Built a benchmark and evaluation pipeline for measuring whether generated code is actually correct and verifiable, targeting more reliable code agents.

Read arXiv

Under review at ICML 2026

DiT-Serve

Designed serving improvements for diffusion transformers using attention optimizations and continuous batching for efficient high-throughput generation.

Preprint

NoveltyRank

Explores how to estimate conceptual novelty in AI papers by extracting and modeling structured signals from scientific content.

Read arXiv

Current Roles

What I’m working on now

Current research appointments and the systems questions they’re anchored around.

Graduate Research Assistant

IRIS Lab, Stanford AI Lab

Nov 2025 - Present

Investigating feedback descent, structured critique models, and multi-round refinement pipelines for scalable model improvement.

Graduate Research Assistant

SALT Lab, Stanford AI Lab

Nov 2025 - Present

Building large-scale pipelines to extract dataset claims and structured metadata from NLP papers, with a focus on novelty and adoption signals.

Research Assistant

Sky Computing Lab, UC Berkeley

Feb 2024 - Oct 2025

Led attention-system integration and batching strategy work for diffusion transformer serving, pushing toward scalable video generation systems.

ML Engineer Lead

UC Berkeley School of Public Health

Jan 2024 - Sep 2025

Built automation and LLM-assisted review pipelines for infectious disease preprints in collaboration with Berkeley and UCSF researchers.

Selected Projects

Systems + applied ML

A few projects outside formal publications that reflect how I build.

Multi-Agent LLM Trading System

2024

Integrated specialized agents, multimodal inputs, layered memory, and retrieval to push beyond standard market prediction baselines.

Pintos Operating System

2023

Extended a teaching OS across memory, scheduling, file systems, networking, and security, with a focus on systems fundamentals.

Secure Client Application

2023

Built a secure Go client with cryptographic support for authentication, file sharing, and revocation.

Education

Training

Stanford University

Master of Science in Computer Science

Expected April 2027 • GPA 4.15/4.00

University of California, Berkeley

B.A. in Computer Science and Data Science

Aug 2021 - May 2025 • GPA 3.95/4.00 • High Distinction

Selected coursework: optimization, probability for data science, security, operating systems, algorithms, computer vision, AI, machine learning, and LLM agents.