5 32 1

zhu

xuekai

AI & ML interests

None yet

Recent Activity

commented on a paper 7 days ago

FlowRL: Matching Reward Distributions for LLM Reasoning

updated a model 8 days ago

xuekai/FlowRL-DeepSeek-7B-code

published a model 8 days ago

xuekai/FlowRL-DeepSeek-7B-code

View all activity

Organizations

upvoted a paper 26 days ago

ASPO: Asymmetric Importance Sampling Policy Optimization

Paper • 2510.06062 • Published 27 days ago • 13

upvoted a paper 28 days ago

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published 29 days ago • 112

upvoted 2 papers about 1 month ago

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Paper • 2509.25123 • Published Sep 29 • 19

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 255

upvoted 4 papers about 2 months ago

upvoted 5 papers 5 months ago

Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17 • 30

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Paper • 2506.08672 • Published Jun 10 • 30

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 130

Discrete Markov Bridge

Paper • 2505.19752 • Published May 26 • 17

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Paper • 2505.16278 • Published May 22 • 5

upvoted 2 papers 6 months ago

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Paper • 2505.13308 • Published May 19 • 27

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120

upvoted a paper 7 months ago

Video-T1: Test-Time Scaling for Video Generation

Paper • 2503.18942 • Published Mar 24 • 90

upvoted a paper 8 months ago

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14 • 28

upvoted an article 9 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 243

upvoted a paper 9 months ago

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published Jan 30 • 23

upvoted an article 10 months ago

Article

Putting RL back in RLHF

Jun 12, 2024

• 106

zhu

AI & ML interests

Recent Activity

Organizations

xuekai's activity

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Putting RL back in RLHF