-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2507.13347
-
Neural-Driven Image Editing
Paper • 2507.05397 • Published • 26 -
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Paper • 2507.10065 • Published • 24 -
From One to More: Contextual Part Latents for 3D Generation
Paper • 2507.08772 • Published • 25
-
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 56 -
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Paper • 2507.10065 • Published • 24 -
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
Paper • 2507.08776 • Published • 54
-
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Paper • 2503.10437 • Published • 32 -
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Paper • 2503.09642 • Published • 19 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 32 -
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Paper • 2503.16422 • Published • 14
-
Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model
Paper • 2309.03550 • Published • 12 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 237 -
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15
-
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
Voxtral
Paper • 2507.13264 • Published • 29 -
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper • 2507.05566 • Published • 112 -
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
Paper • 2507.09477 • Published • 84
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 11.3k • 1.21k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 425 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 62
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Neural-Driven Image Editing
Paper • 2507.05397 • Published • 26 -
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Paper • 2507.10065 • Published • 24 -
From One to More: Contextual Part Latents for 3D Generation
Paper • 2507.08772 • Published • 25
-
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
Voxtral
Paper • 2507.13264 • Published • 29 -
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper • 2507.05566 • Published • 112 -
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
Paper • 2507.09477 • Published • 84
-
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 56 -
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Paper • 2507.10065 • Published • 24 -
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
Paper • 2507.08776 • Published • 54
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 11.3k • 1.21k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 425 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 62
-
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Paper • 2503.10437 • Published • 32 -
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Paper • 2503.09642 • Published • 19 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 32 -
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Paper • 2503.16422 • Published • 14
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model
Paper • 2309.03550 • Published • 12 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 237 -
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15