- 
	
	
	
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 - 
	
	
	
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 - 
	
	
	
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 - 
	
	
	
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 
Collections
Discover the best community collections!
Collections including paper arxiv:2412.09624 
						
					
				- 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Paper • 2501.02790 • Published • 9 - 
	
	
	
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Paper • 2509.25154 • Published • 29 - 
	
	
	
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper • 2509.25760 • Published • 54 
- 
	
	
	
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 15 - 
	
	
	
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 - 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 157 
- 
	
	
	
UnCommon Objects in 3D
Paper • 2501.07574 • Published • 13 - 
	
	
	
Bringing Objects to Life: 4D generation from 3D objects
Paper • 2412.20422 • Published • 40 - 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
GameFactory: Creating New Games with Generative Interactive Videos
Paper • 2501.08325 • Published • 67 
- 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
Paper • 2412.09428 • Published • 7 - 
	
	
	
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 35 - 
	
	
	
FashionComposer: Compositional Fashion Image Generation
Paper • 2412.14168 • Published • 16 
- 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
				IamCreateAI/Ruyi-Mini-7B
Image-to-Video • Updated • 163 • 612 - 
	
	
	
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Paper • 2412.06016 • Published • 20 - 
	
	
	
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 
- 
	
	
	
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 70 - 
	
	
	
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Paper • 2411.02355 • Published • 51 - 
	
	
	
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 55 - 
	
	
	
RARe: Retrieval Augmented Retrieval with In-Context Examples
Paper • 2410.20088 • Published • 4 
- 
	
	
	
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 - 
	
	
	
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 - 
	
	
	
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 - 
	
	
	
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 
- 
	
	
	
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 15 - 
	
	
	
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 - 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 157 
- 
	
	
	
UnCommon Objects in 3D
Paper • 2501.07574 • Published • 13 - 
	
	
	
Bringing Objects to Life: 4D generation from 3D objects
Paper • 2412.20422 • Published • 40 - 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
GameFactory: Creating New Games with Generative Interactive Videos
Paper • 2501.08325 • Published • 67 
- 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
Paper • 2412.09428 • Published • 7 - 
	
	
	
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 35 - 
	
	
	
FashionComposer: Compositional Fashion Image Generation
Paper • 2412.14168 • Published • 16 
- 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Paper • 2501.02790 • Published • 9 - 
	
	
	
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Paper • 2509.25154 • Published • 29 - 
	
	
	
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper • 2509.25760 • Published • 54 
- 
	
	
	
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 - 
	
	
	
				IamCreateAI/Ruyi-Mini-7B
Image-to-Video • Updated • 163 • 612 - 
	
	
	
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Paper • 2412.06016 • Published • 20 - 
	
	
	
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 
- 
	
	
	
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 70 - 
	
	
	
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Paper • 2411.02355 • Published • 51 - 
	
	
	
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 55 - 
	
	
	
RARe: Retrieval Augmented Retrieval with In-Context Examples
Paper • 2410.20088 • Published • 4