Kairos 3.0

๐Ÿ’œ Kairos Platform    ๏ฝœ    ๐Ÿ–ฅ๏ธ GitHub    |   ๐Ÿค— Hugging Face   |   ๐Ÿค– Model Scope  |    ๐Ÿ“‘ Paper   


Kairos 3.0 is grounded in physical laws as its cognitive foundation, establishing a unified cross-embodiment world modeling framework. Featuring a 4B-parameter architecture with a custom hybrid linear attention operator, it unifies multimodal understanding, generation, and action prediction for real-time edge deployment. By achieving physics-level deep cognition and low-latency inference, it empowers high-precision action prediction and HD generation for both physical and digital embodied AI applications.

๐ŸŽฏ 1. Motivation

While Scaling Laws are emerging in Embodied AI, their efficiency is severely bottlenecked by data heterogeneity, poor long-horizon reasoning, and edge-side compute constraints. These hurdles make scaling alone insufficient for reliable interaction, hindering the path to industrial-grade General Embodied Intelligence.

๐ŸŒŸ 2. Kairos 3.0 Framework

๐ŸŒ Unified World Modeling Framework

Kairos 3.0 uses fundamental physical and causal laws as its cognitive foundation. By integrating real-robot interaction, structured human behavior, and Chain-of-Thought (CoT) data, it breaks heterogeneity barriers and boosts data reuse efficiency. This shifts the paradigm from simple imitation to physics-level deep understanding, enabling robust generalization and long-horizon reasoning at a more efficient model scale.

๐Ÿ”— Integrated Multimodal Architecture

Designed as a unified end-to-end pipeline for Understanding, Generating, and Predicting the world. Leveraging physical laws and causal CoT, the model doesn't just "see" but "understands" the underlying logic of environments. This allows for precise decomposition of complex tasks, seamless planning, and reliable execution in a single intelligence loop.

โšก Linear-Time Attention for World Models

Introducing the first Hybrid Linear Attention operator specifically for world models. By reducing temporal complexity from $O(n^2)$ to $O(n)$, Kairos 3.0 slashes VRAM and compute overhead while maintaining long-sequence capabilities. This enables the industryโ€™s first real-time on-robot inference for an open-source world model.

โœจ 3. Demos

Physicalโ€“causal consistency Cross-embodiment generalization High-efficiency inference

๐Ÿง  Physicalโ€“causal consistency

Kairos leverages causal CoT and physical laws to transform multimodal inputs into deep task logic. It enables autonomous planning and feasibility analysis, shifting the system from "executing commands" to "understanding intent" for real-world robotic actions.

๐ŸŽจ Cross-embodiment generalization

Unified Cross-Embodied Generation: A single "brain" that generalizes across single-arm, dual-arm, and dexterous-hand platforms. Kairos enables shared, transferable world knowledge with maximal adaptability. Broad Hardware Support: Native compatibility with Agibot G1, Unitree G1, and Songling PIPER, significantly slashing development costs through zero-shot multi-task generalization.

๐Ÿ”ฎ High-efficiency inference

Real-time Edge Performance: Industry-leading inference speed with ultra-low resource consumption. Optimized for low-latency, high-reliability deployment across single or multi-GPU embodied systems.

๐Ÿ“ฆ 4. Model Zoo

Download Links Model Version Use cases Highlights
๐Ÿค—HuggingFace ๐Ÿค–ModelScope kairos-4B-480P 480p general pretrained model 480p pretrained model for downstream fine-tuning.
๐Ÿค—HuggingFace ๐Ÿค–ModelScope kairos-4B-robot-480P Robot manipulation & real-world closed-loop control Specialized for embodied AI; leading accuracy on PAI-Bench
๐Ÿค—HuggingFace ๐Ÿค–ModelScope kairos-4B-robot-480P-distillation On-robot Integrationใ€Edge Computingใ€Low-power Efficiency Ultra-lightweight via distillation; enables real-time inference on embedded/edge devices.
๐Ÿค—HuggingFace ๐Ÿค–ModelScope kairos-4B-720p HD visual generation & complex physical reasoning Supports 720P HD output with enhanced fine-grained detail capture.

๐Ÿ“ˆ5. Evaluation

๐ŸŽฏ 5.1 Accuracy Benchmarks

Domain Benchmarks Kairos-Robot Cosmos 2.5-2B* Wan 2.2-5B* Cosmos 2.5-14B* Lingbot*
Robot PAI-Bench-robot 80.03 78.3 78.6 79.4 79.96
WorldModelBench-robot TI2V 9.08 9.04 8.52 8.94 9.04
DreamGen Bench(PA/IF) 0.529/0.609 0.418/0.568 0.314/0.543 0.495/0.478 0.466/0.569
Domain Benchmarks Kairos 3.0-4B Cosmos 2.5-2B* Wan 2.2-5B* Cosmos 2.5-14B
General PAI-Bench 80.84 81.0 80.4 81.0
WorldModelBench 8.94 8.86 8.70 9.02*
VideoPHY 45.55 44.64 38.85 -

*๏ผˆresults reproduced from open-source model baselines, "robot" refers to the corresponding results of the robot subset.๏ผ‰

Kairos models deliver SOTA performance across diverse benchmarks. In embodied scenarios, Kairos-Robot leads PAI-Bench with a score of 80.03 and dominates generalization tasks in DreamGen Bench. For general world modeling, Kairos 3.0-4B matches or exceeds larger-scale models on WorldModelBench and VideoPHY, achieving a perfect balance of precision and efficiency at a compact 4B scale.

โšก 5.2 Deployment

5.2.1 Real-time Inference

GPU Resulotion Memory(GB) 1 GPU (s) 4 GPUs (s)
NV-A800 480P 23.5 11.7 3.0
NV-RTX5090 480P 13.9 11.4 5.7

*๏ผˆresults based on kairos-4B-robot 480p distillation๏ผ‰

5.2.2 Benchmark for A800 GPU

Model Parameter Memory (GB) Complexity (PFlops) 1 GPU (s) 4 GPUs (s)
Kairos 3.0 4B 23.5 2.3 43.3 9.5
Cosmos 2.5 14B 70.2 156.5 (~70x) 2526.0 687.2
Wan 2.2 5B 23.4 16.6 (~7x) 201.0 85.0
Lingbot 28B 46.1 347.4 (~160x) 5525.0 1436.0

*๏ผˆevaluation setting๏ผšTI2V mode with 720P/5s๏ผ‰

๐Ÿ”ง 6. Quick Start

6.1 Environment Installation

# Clone the repository
git clone https://github.com/kairos-agi/kairos-sensenova.git
cd kairos-sensenova

# You can set up the environment in two ways:
# 1) Build container from the Docker image
# 2) Build the environment from requirements with conda or venv

# 1) Docker image:
# Pull the Docker image
echo ghp_xxxxxxxxxxxxxxxxx | docker login ghcr.io -u username --password-stdin
docker pull ghcr.io/kairos-agi/kairos-sensenova:v0.0.1

# Create a container using Docker
docker run --rm -it \
  --gpus all \
  -v $(pwd):/workspace \
  ghcr.io/kairos-agi/kairos-sensenova:v0.0.1 \
  bash


# 2) requirments
# build a python environment with python>=3.10 && torch>=2.6 && cuda>=12.6
# install requirements
pip install -r requirements.txt

6.2 Download Models

  • Download with huggingface
pip install -U huggingface_hub 

# Download kairos model
# 4B-480P
hf download kairos-agi/kairos-sensenova-4B-480P-pretrained \
  --local-dir models/Kairos-model/kairos-sensenova-4B-480P-pretrained 

# 4B-720P
hf download kairos-agi/kairos-sensenova-4B-720P \
  --local-dir models/Kairos-model/kairos-sensenova-4B-720P 


  • Download with modelscope
pip install modelscope

# Download kairos model
# 4B-480P
modelscope download kairos-team/kairos-sensenova-4B-480P-pretrained \
  --local_dir models/Kairos-model/kairos-sensenova-4B-480P-pretrained 

# 4B-720P
modelscope download kairos-team/kairos-sensenova-4B-720P \
  --local_dir models/Kairos-model/kairos-sensenova-4B-720P 

6.3 Run Inference

# Step1: Fetch the Model
mkdir -p models/Qwen models/Wan2.1-T2V-14B

# Download Qwen2.5-VL for Text-Encoder
hf download Qwen/Qwen2.5-VL-7B-Instruct-AWQ \
  --local-dir models/Qwen/Qwen2.5-VL-7B-Instruct-AWQ \
  
# Dowload Wan2.1-VAE for VAE-Encoder/Decoder
hf download Wan-AI/Wan2.1-T2V-14B \
  --local-dir models/Wan2.1-T2V-14B \
  --include "Wan2.1_VAE.pth"  

# Step2: Run the examples
# Text2Video
bash examples/inference.sh examples/example_t2v.json
# Text&FirstImage2Video
bash examples/inference.sh examples/example_ti2v.json
# FirstImage2Video
bash examples/inference.sh examples/example_i2v.json

๐Ÿ‘ฅ 7. About Us

Developed and maintained by the Kairos Team. We specialize in Embodied Intelligence and World Model research, with a mission to build Artificial General Intelligence (AGI) that truly understands the physical world. Our goal is to accelerate the industrialization of embodied technologies and reshape the global landscape of AI competition.

๐Ÿ“„ 8. License

Kairos is open-sourced under the Apache License 2.0. Feel free to use, modify, and build commercial products on top of it. Check the LICENSE file for the full text.

9. Acknowledgements

We would like to thank the contributors to Qwen-Image, Wan2.1, DiffSynth-Studio and HuggingFace for their open-source research contributions.


โญ Star us on GitHub if you find Kairos 3.0 helpful!

Downloads last month
240
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including kairos-agi/kairos-sensenova-common