Video - a Carlosvirella100 Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Carlosvirella100 's Collections

CAMV

Video

Video

updated 7 days ago

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 15
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 56
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Paper • 2504.01956 • Published Apr 2 • 40
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29 • 7
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Paper • 2507.03336 • Published Jul 4 • 5
GTA1: GUI Test-time Scaling Agent

Paper • 2507.05791 • Published Jul 8 • 26
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Paper • 2507.07136 • Published Jul 9 • 37
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective

Paper • 2507.08801 • Published Jul 11 • 30
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper • 2507.10524 • Published Jul 14 • 70
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16 • 42
OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Paper • 2507.09075 • Published Jul 11 • 15
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Paper • 2507.10541 • Published Jul 14 • 29
Lizard: An Efficient Linearization Framework for Large Language Models

Paper • 2507.09025 • Published Jul 11 • 18
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs

Paper • 2507.08616 • Published Jul 11 • 13
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Paper • 2507.11407 • Published Jul 15 • 57
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Paper • 2507.13332 • Published Jul 17 • 48
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization

Paper • 2507.12142 • Published Jul 16 • 36
FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Paper • 2507.12720 • Published Jul 17 • 9
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Paper • 2507.13158 • Published Jul 17 • 24
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers

Paper • 2507.08422 • Published Jul 11 • 36
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20 • 59
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling

Paper • 2507.11061 • Published Jul 15 • 37
Gaussian Splatting with Discretized SDF for Relightable Assets

Paper • 2507.15629 • Published Jul 21 • 23
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 120
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Paper • 2507.17745 • Published Jul 23 • 33
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21 • 64
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30 • 98
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Paper • 2507.23632 • Published Jul 31 • 6
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1 • 62
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1 • 91
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 257
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Paper • 2508.03686 • Published Aug 5 • 36
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 236
Efficient Agents: Building Effective Agents While Reducing Cost

Paper • 2508.02694 • Published Jul 24 • 85
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 70
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Paper • 2508.02091 • Published Aug 4 • 13
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Paper • 2508.05731 • Published Aug 7 • 25
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11 • 41
Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published Aug 7 • 11
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 186
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Paper • 2508.06600 • Published Aug 8 • 39
Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11 • 28
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 47
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published Aug 13 • 56
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7 • 46
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Paper • 2508.09968 • Published Aug 13 • 15
A Survey on Diffusion Language Models

Paper • 2508.10875 • Published Aug 14 • 34
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Paper • 2508.14896 • Published Aug 20 • 22
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Paper • 2508.10395 • Published Aug 14 • 42
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20 • 36
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21 • 87
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

Paper • 2508.14811 • Published Aug 20 • 40
UQ: Assessing Language Models on Unsolved Questions

Paper • 2508.17580 • Published Aug 25 • 15
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Paper • 2508.17472 • Published Aug 24 • 26
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published Aug 26 • 15
Autoregressive Universal Video Segmentation Model

Paper • 2508.19242 • Published Aug 26 • 28
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27 • 30
Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper • 2508.19652 • Published Aug 27 • 84
SpotEdit: Evaluating Visually-Guided Image Editing Methods

Paper • 2508.18159 • Published Aug 25 • 3
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Paper • 2508.15804 • Published Aug 14 • 15
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80
Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28 • 34
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Paper • 2508.20096 • Published Aug 27 • 36
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published Aug 22 • 4
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published Aug 28 • 75
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published Aug 29 • 12
Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published Aug 29 • 18
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

Paper • 2508.17677 • Published Aug 25 • 14
CLIPSym: Delving into Symmetry Detection with CLIP

Paper • 2508.14197 • Published Aug 19 • 8
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published Sep 2 • 121
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Paper • 2509.00428 • Published Aug 30 • 17
Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5 • 45
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

Paper • 2509.12203 • Published Sep 15 • 19
Locality in Image Diffusion Models Emerges from Data Statistics

Paper • 2509.09672 • Published Sep 11 • 12
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios

Paper • 2509.09926 • Published Sep 12 • 13
Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16 • 33
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Paper • 2509.10441 • Published Sep 12 • 30
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

Paper • 2509.13309 • Published Sep 16 • 66
Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16 • 69
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation

Paper • 2509.10687 • Published Sep 12 • 6
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

Paper • 2509.15212 • Published Sep 18 • 21
AToken: A Unified Tokenizer for Vision

Paper • 2509.14476 • Published Sep 17 • 36
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 131
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published 28 days ago • 100
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published 27 days ago • 121
Fine-tuning Done Right in Model Editing

Paper • 2509.22072 • Published 27 days ago • 27
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published 24 days ago • 133
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published 27 days ago • 29
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published 23 days ago • 13
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Paper • 2509.25848 • Published 24 days ago • 77
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published 24 days ago • 46
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Paper • 2509.22651 • Published 27 days ago • 22
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Paper • 2509.22414 • Published 27 days ago • 21
LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published 23 days ago • 106
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

Paper • 2509.21880 • Published 28 days ago • 44
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published 17 days ago • 45
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Paper • 2510.03561 • Published 20 days ago • 23
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

Paper • 2509.25771 • Published 24 days ago • 10
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published 18 days ago • 22
Efficient Intent Detection with Dual Sentence Encoders

Paper • 2003.04807 • Published Mar 10, 2020 • 2
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published 10 days ago • 164
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Paper • 2510.09201 • Published 13 days ago • 46
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published 9 days ago • 25

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs