-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2507.07104
-
Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles
Paper • 2505.23590 • Published • 25 -
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
Paper • 2505.24273 • Published • 4 -
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
Paper • 2506.00413 • Published • 9 -
DINGO: Constrained Inference for Diffusion LLMs
Paper • 2505.23061 • Published • 31
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 167 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77
-
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
Paper • 2507.09477 • Published • 84 -
Replacing thinking with tool usage enables reasoning in small language models
Paper • 2507.05065 • Published • 15 -
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
Paper • 2507.13300 • Published • 19 -
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models
Paper • 2507.07104 • Published • 45
-
Large Language Models are Locally Linear Mappings
Paper • 2505.24293 • Published • 14 -
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models
Paper • 2507.07104 • Published • 45 -
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper • 2507.08799 • Published • 40 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 183
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
Paper • 2507.09477 • Published • 84 -
Replacing thinking with tool usage enables reasoning in small language models
Paper • 2507.05065 • Published • 15 -
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
Paper • 2507.13300 • Published • 19 -
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models
Paper • 2507.07104 • Published • 45
-
Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles
Paper • 2505.23590 • Published • 25 -
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
Paper • 2505.24273 • Published • 4 -
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
Paper • 2506.00413 • Published • 9 -
DINGO: Constrained Inference for Diffusion LLMs
Paper • 2505.23061 • Published • 31
-
Large Language Models are Locally Linear Mappings
Paper • 2505.24293 • Published • 14 -
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models
Paper • 2507.07104 • Published • 45 -
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper • 2507.08799 • Published • 40 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 183
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 167 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77