4 25 15

Le Yu

vanillaOVO

https://yule-buaa.github.io/

yule-BUAA

AI & ML interests

None yet

Recent Activity

upvoted a paper 25 days ago

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

upvoted a paper 3 months ago

Agentic Reinforced Policy Optimization

upvoted a paper 3 months ago

Group Sequence Policy Optimization

View all activity

Organizations

None yet

upvoted a paper 25 days ago

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Paper • 2510.08276 • Published 26 days ago • 9

upvoted 3 papers 3 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 156

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 306

RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

Paper • 2507.15024 • Published Jul 20 • 14

upvoted a collection 4 months ago

Qwen3

Collection

84 items • Updated Aug 6 • 1.39k

upvoted 3 papers 5 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 422

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 185

upvoted 2 papers 6 months ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 308

WorldPM: Scaling Human Preference Modeling

Paper • 2505.10527 • Published May 15 • 34

upvoted an article 9 months ago

Article

Putting RL back in RLHF

Jun 12, 2024

• 106

upvoted a paper 10 months ago

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2 • 52

upvoted a paper 11 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

upvoted a paper about 1 year ago

A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

Paper • 2410.13841 • Published Oct 17, 2024 • 17

upvoted a collection over 1 year ago

Llama 3.1

Collection

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 694

upvoted an article over 1 year ago

Article

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Jul 23, 2024

• 238

upvoted a collection over 1 year ago

Qwen2

Collection

Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated Jul 21 • 371

upvoted 2 articles over 1 year ago

Article

Fine-tune Llama 3 with ORPO

•

Apr 22, 2024

• 239

Article

Merge Large Language Models with mergekit

•

Jan 9, 2024

• 144

upvoted a paper over 1 year ago

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 29

Le Yu

AI & ML interests

Recent Activity

Organizations

vanillaOVO's activity

Putting RL back in RLHF

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Fine-tune Llama 3 with ORPO

Merge Large Language Models with mergekit