Xin Li's picture

66 195

Xin Li

lixin67

·

WilliamLeeBravo

AI & ML interests

None yet

Recent Activity

upvoted a paper 7 days ago

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

upvoted a paper 9 days ago

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

liked a model 9 days ago

Qwen/Qwen3-VL-4B-Instruct

View all activity

Organizations

None yet

upvoted a paper 7 days ago

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published 8 days ago • 64

upvoted a paper 9 days ago

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models

Paper • 2510.12784 • Published 10 days ago • 19

upvoted a collection 20 days ago

Qwen3-VL

25 items • Updated 3 days ago • 316

upvoted a collection 26 days ago

LLaVA-OneVision-1.5

https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5 • 9 items • Updated 3 days ago • 16

upvoted 2 papers about 2 months ago

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published Sep 2 • 121

"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

Paper • 2508.15752 • Published Aug 21 • 7

upvoted a collection about 2 months ago

UI Agent

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 423 items • Updated about 4 hours ago • 63

upvoted a paper about 2 months ago

An Illusion of Progress? Assessing the Current State of Web Agents

Paper • 2504.01382 • Published Apr 2 • 4

upvoted a collection about 2 months ago

InternVL3.5

This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated 26 days ago • 99

upvoted 2 papers 2 months ago

Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21 • 87

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 254

upvoted a collection 2 months ago

MolmoAct Data Mixture

All datasets for the MolmoAct (Multimodal Open Language Model for Action) release. • 4 items • Updated Sep 6 • 15

upvoted 4 papers 2 months ago

Test-Time Reinforcement Learning for GUI Grounding via Region Consistency

Paper • 2508.05615 • Published Aug 7 • 22

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 136

Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal

Paper • 2508.05988 • Published Aug 8 • 19

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 186

upvoted 2 papers 3 months ago

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

Paper • 2507.12566 • Published Jul 16 • 14

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 184

upvoted an article 4 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

Jul 8

• 699

upvoted a paper 4 months ago

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10 • 33