Collections
Discover the best community collections!
Collections including paper arxiv:2404.04125 
						
					
				- 
	
	
	
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 - 
	
	
	
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 29 - 
	
	
	
Probing the 3D Awareness of Visual Foundation Models
Paper • 2404.08636 • Published • 14 - 
	
	
	
AM-RADIO: Agglomerative Model -- Reduce All Domains Into One
Paper • 2312.06709 • Published • 2 
- 
	
	
	
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Paper • 1705.04146 • Published • 1 - 
	
	
	
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Paper • 2404.03411 • Published • 11 - 
	
	
	
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 - 
	
	
	
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 
- 
	
	
	
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Paper • 2403.19651 • Published • 25 - 
	
	
	
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 - 
	
	
	
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 29 - 
	
	
	
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 48 
- 
	
	
	
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 - 
	
	
	
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36 - 
	
	
	
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 - 
	
	
	
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 10 
- 
	
	
	
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 - 
	
	
	
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26 - 
	
	
	
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 28 - 
	
	
	
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18 
- 
	
	
	
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 - 
	
	
	
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Paper • 2310.02960 • Published • 1 - 
	
	
	
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 - 
	
	
	
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 10 
- 
	
	
	
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 - 
	
	
	
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36 - 
	
	
	
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 - 
	
	
	
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 10 
- 
	
	
	
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 - 
	
	
	
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 29 - 
	
	
	
Probing the 3D Awareness of Visual Foundation Models
Paper • 2404.08636 • Published • 14 - 
	
	
	
AM-RADIO: Agglomerative Model -- Reduce All Domains Into One
Paper • 2312.06709 • Published • 2 
- 
	
	
	
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Paper • 1705.04146 • Published • 1 - 
	
	
	
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Paper • 2404.03411 • Published • 11 - 
	
	
	
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 - 
	
	
	
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 
- 
	
	
	
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Paper • 2403.19651 • Published • 25 - 
	
	
	
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 - 
	
	
	
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 29 - 
	
	
	
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 48 
- 
	
	
	
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 - 
	
	
	
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26 - 
	
	
	
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 28 - 
	
	
	
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18 
- 
	
	
	
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 - 
	
	
	
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Paper • 2310.02960 • Published • 1 - 
	
	
	
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 - 
	
	
	
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 10