7 22 8

Junxiao Yang

yangjunxiao2021

https://yangjunx21.github.io/

yangjunx21

AI & ML interests

Alignment/AI safety

Recent Activity

upvoted a paper 5 days ago

It Takes Two: Your GRPO Is Secretly DPO

upvoted a collection 5 days ago

Agent & RL

upvoted a paper 5 days ago

Glyph: Scaling Context Windows via Visual-Text Compression

View all activity

Organizations

upvoted a paper 5 days ago

It Takes Two: Your GRPO Is Secretly DPO

Paper • 2510.00977 • Published 24 days ago • 31

upvoted a collection 5 days ago

Agent & RL

Collection

47 items • Updated 15 days ago • 13

upvoted 2 papers 5 days ago

Glyph: Scaling Context Windows via Visual-Text Compression

Paper • 2510.17800 • Published 5 days ago • 55

A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

Paper • 2510.15444 • Published 9 days ago • 135

upvoted a paper 15 days ago

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Paper • 2508.07976 • Published Aug 11 • 51

upvoted a paper 16 days ago

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published 19 days ago • 91

upvoted a paper 17 days ago

Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs

Paper • 2509.24107 • Published 27 days ago • 74

upvoted 2 papers 27 days ago

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

Paper • 2504.12516 • Published Apr 16 • 1

BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

Paper • 2504.19314 • Published Apr 27 • 7

upvoted 3 papers about 1 month ago

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6 • 2

Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

Paper • 2509.09666 • Published Sep 11 • 33

SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge

Paper • 2509.07968 • Published Sep 9 • 14

upvoted a paper about 2 months ago

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3 • 24

upvoted 5 papers 5 months ago

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Paper • 2505.15656 • Published May 21 • 15

AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82

upvoted an article 5 months ago

Article

arXiv实用技巧，如何让你的paper关注度变高？

•

Jul 8, 2024

• 14

upvoted a paper 5 months ago

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Paper • 2505.11896 • Published May 17 • 58

Junxiao Yang

AI & ML interests

Recent Activity

Organizations

yangjunxiao2021's activity

arXiv实用技巧，如何让你的paper关注度变高？