Cosmos Predict 2.5 & Transfer 2.5: Evolving the World Foundation Models for Physical AI

Community Article Published October 28, 2025

CosmosTransfer

NVIDIA Cosmos family of open world models is redefining how we model and simulate the real world. Built for robotics, autonomous systems, and simulation-driven AI, Cosmos world foundation models (WFMs) enable machines to see, imagine, and reason about physical reality.

With the launch of Cosmos Predict 2.5 and Cosmos Transfer 2.5, world generation has taken another major leap forward. These models extend the Cosmos family into longer horizons, richer viewpoints, and more adaptive domain transformations โ€” laying the groundwork for scalable physical AI.

๐Ÿ”ฎ Cosmos Predict 2.5: World Generation

Cosmos Predict 2.5 merges what were once three separate models โ€” Text2World, Image2World, and Video2World โ€” into a single, unified architecture capable of generating consistent, controllable video worlds from several input modalities. Trained on 200 million high-quality clips and enhanced into a unified model with a new reinforcement learning (RL) algorithm, it outperforms Cosmos Predict 1 in quality and prompt alignment for generating high-quality synthetic video data from single frames.

โœจ Key Highlights
  • One Powerful Model Cosmos Predict 2.5 has the capabilities of Text2World, Image2World, and Video2World into a single model and utilizes Cosmos Reason 1, a Physical AI reasoning vision language model (VLM), as the text encoder. Saving cost on compute and time needed to post-train and build your Physical AI workflows .
  • Extended Video Horizons Produces sequences up to 30 seconds, maintaining spatial-temporal coherence โ€” important for simulation, long-horizon prediction, and robotic planning.
  • Multi-View Generation Create synchronized camera views for realistic multi-camera setups in autonomous vehicle (AV) training or robot vision with camera control.
  • Grounded Prompt Alignment Integrates Cosmos Reason as a text-scene encoder, tightening semantic grounding and reducing hallucinations.
  • Efficiency-Driven Design Despite its scale and capability, Predict 2.5 improves upon overall quality, inference speed, and resource efficiency through architectural refinements.

๐Ÿ”— Explore Predict 2.5 โ†’

๐Ÿ” Cosmos Transfer 2.5: Spatially Controlled World Transformation

While Predict 2.5 creates worlds, Transfer 2.5 transforms them โ€” enabling high-fidelity, spatially conditioned world-to-world translation. Compared to Cosmos Transfer1-7B, CosmosTransfer2.5-2B is much smaller, with better prompt and physics alignment, and results in less hallucination and error accumulation for long video generations.

โœจ Key Highlights

  • Smaller, faster, and enhanced quality 3.5 ร— smaller than its predecessor yet faster and better quality โ€” optimized for deployment in both research and production pipelines.

  • Policy training for robots Robot policy models trained with Cosmos Transfer 2.5-2B augmentation significantly outperform others in generalizing to novel environments.

  • Better adherence to control signals for autonomous vehicles The evaluation of 3D lane and cuboid detection on generated multi-view videosโ€”using real-world scenarios as the control inputโ€”shows up to a 60% improvement over the previous model (Transfer1-7B-Sample-AV), using LATR for lane detection and BEVFormer for cuboid detection.

  • Multi-camera consistency for autonomous vehicles Cosmos Transfer 2.5 improves on Cosmos Transfer 1 by distributing control blocks more evenly throughout the network for smoother integration of conditioning information.

  • Less error accumulation Transfer 2.5 shows less error accumulation for all four control modalities (edge/blur/depth/segmentation) compared to Cosmos-Transer1-7B.

๐Ÿ”— Explore Transfer 2.5 โ†’

Other Updates from Cosmos Platform

๐Ÿง  Cosmos Reason 1: Reasoning Vision Language Model

Part of the Cosmos WFMs, NVIDIA Cosmos Reason is an open, customizable, 7-billion-parameter reasoning Vision Language Model (VLM) for physical AI and robotics. The model enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the real world. Cosmos Reason has topped the Physical Reasoning leaderboard. The model is also available as an NVIDIA NIM, which offers secure, easy-to-use microservices for deploying high-performance generative AI across any environment.

๐Ÿ” Cosmos Dataset Search: Large-scale Data Search and Retrieval

To accelerate model post-training, NVIDIA Cosmos Dataset Search is a vector-based workflow that enables physical AI developers to instantly search and retrieve targeted scenarios from massive training datasets. It uses the Cosmos-Embed NIM to enable highly accurate semantic search and connects to NVIDIA Cosmos Curator to refine datasets and retrieve queried data with incredible efficiency and accuracy. With the ability to search billions of clips in seconds, Cosmos Dataset Search dramatically shortens post-trainingโ€”cutting development cycles from years to days.

๐Ÿงฉ Use cases and Workflows

๐Ÿ”— Cosmos Cookbook The Cosmos Cookbook offers developers step-by-step recipes and post-training scripts to quickly build, customize, and deploy NVIDIAโ€™s Cosmos world foundation models for robotics and autonomous systems.

Read the ๐Ÿ”— lastest white paper from the NVIDIA Research team for more details on the capabilities and benchmarks of Cosmos Predict 2.5 and Cosmos Transfer 2.5.

๐Ÿง  Resources

๐Ÿ’ช Get Started today

All Cosmos WFMs are available on Hugging Face - model checkpoints here.

  • Cosmos Predict 2.5 - Multimodal world foundation model for generating next frames based on input prompts. Explore GitHub repository for inference and post-training scripts.
  • Cosmos Transfer 2.5 - Multicontrol world foundation model for data augmentation from structured video inputs. Explore GitHub for inference and post-training scripts.

Join our community for regular updates, Q&A, livestreams and hands-on tutorials!

Community

Sign up or log in to comment