BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 38
Rethinking Spectral Augmentation for Contrast-based Graph Self-Supervised Learning Paper • 2405.19600 • Published May 30, 2024
Communication-Efficient Decentralized Online Continuous DR-Submodular Maximization Paper • 2208.08681 • Published Aug 18, 2022
Roughness Index for Loss Landscapes of Neural Network Models of Partial Differential Equations Paper • 2103.11069 • Published Mar 20, 2021
DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models Paper • 2404.05083 • Published Apr 7, 2024
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published 10 days ago
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published 10 days ago
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models Paper • 2109.05093 • Published Sep 10, 2021 • 1
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Paper • 2201.05966 • Published Jan 16, 2022 • 1
Unifying Autoregressive and Diffusion-Based Sequence Generation Paper • 2504.06416 • Published Apr 8 • 3
Unifying Autoregressive and Diffusion-Based Sequence Generation Paper • 2504.06416 • Published Apr 8 • 3
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published 15 days ago • 31