5 20 6

Zichen

lkevinzc

https://lkevinzc.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 3 hours ago

Defeating the Training-Inference Mismatch via FP16

upvoted a paper 27 days ago

Imperceptible Jailbreaking against Large Language Models

upvoted a paper 27 days ago

GEM: A Gym for Agentic LLMs

View all activity

Organizations

upvoted a paper about 3 hours ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published 4 days ago • 11

upvoted 2 papers 27 days ago

Imperceptible Jailbreaking against Large Language Models

Paper • 2510.05025 • Published 28 days ago • 33

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1 • 87

upvoted 3 papers about 1 month ago

upvoted a paper 4 months ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30 • 50

upvoted 4 papers 5 months ago

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Paper • 2506.02096 • Published Jun 2 • 52

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28 • 29

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26 • 23

upvoted a paper 6 months ago

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 36

upvoted 2 papers 7 months ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14 • 13

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 56

upvoted a collection 7 months ago

🌾Oat-Zero: Understanding R1-Zero-Like Training

Collection

5 items • Updated Apr 10 • 7

upvoted a paper 8 months ago

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published Feb 18 • 19

upvoted a paper 12 months ago

Sample-Efficient Alignment for LLMs

Paper • 2411.01493 • Published Nov 3, 2024 • 12

upvoted a collection over 1 year ago

💡 DICE

Collection

Self-alignment with DPO Implicit Rewards • 5 items • Updated Jul 28, 2024 • 9

upvoted 2 papers over 1 year ago

RegMix: Data Mixture as Regression for Language Model Pre-training

Paper • 2407.01492 • Published Jul 1, 2024 • 40

Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14, 2024 • 40

Zichen

AI & ML interests

Recent Activity

Organizations

lkevinzc's activity