Exploring Conditions for Diffusion models in Robotic Control Paper • 2510.15510 • Published Oct 17 • 39
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs Paper • 2510.13251 • Published Oct 15 • 12
HyperCLOVA X SEED Collection HyperCLOVA X SEED is NAVER's lightweight open-source lineup with a strong focus on Korean language performance • 4 items • Updated Jul 22 • 28
ProLIP Collection Official ProLIP weights, Probabilistic Language-Image Pre-Training (ICLR 2025) • 7 items • Updated Apr 18 • 10
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation Paper • 2411.19067 • Published Nov 28, 2024 • 8
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated about 13 hours ago • 42
Unified Speech-Text Pretraining for Spoken Dialog Modeling Paper • 2402.05706 • Published Feb 8, 2024 • 7
Rethinking Spatial Dimensions of Vision Transformers Paper • 2103.16302 • Published Mar 30, 2021 • 1
RDNet Collection DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [ECCV 2024] • 9 items • Updated Oct 16, 2024 • 3
rope-vit Collection Rotary Position Embedding for Vision Transformer [ECCV 2024] • 22 items • Updated Oct 16, 2024 • 4
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Paper • 2403.19588 • Published Mar 28, 2024 • 4
Rotary Position Embedding for Vision Transformer Paper • 2403.13298 • Published Mar 20, 2024 • 6