Kairos 3.0
๐ Kairos Platform ๏ฝ ๐ฅ๏ธ GitHub | ๐ค Hugging Face | ๐ค Model Scope | ๐ Paper
Kairos 3.0 is grounded in physical laws as its cognitive foundation, establishing a unified cross-embodiment world modeling framework. Featuring a 4B-parameter architecture with a custom hybrid linear attention operator, it unifies multimodal understanding, generation, and action prediction for real-time edge deployment. By achieving physics-level deep cognition and low-latency inference, it empowers high-precision action prediction and HD generation for both physical and digital embodied AI applications.
๐ฏ 1. Motivation
While Scaling Laws are emerging in Embodied AI, their efficiency is severely bottlenecked by data heterogeneity, poor long-horizon reasoning, and edge-side compute constraints. These hurdles make scaling alone insufficient for reliable interaction, hindering the path to industrial-grade General Embodied Intelligence.
๐ 2. Kairos 3.0 Framework
![]() |
![]() |
![]() |
๐ Unified World Modeling Framework
Kairos 3.0 uses fundamental physical and causal laws as its cognitive foundation. By integrating real-robot interaction, structured human behavior, and Chain-of-Thought (CoT) data, it breaks heterogeneity barriers and boosts data reuse efficiency. This shifts the paradigm from simple imitation to physics-level deep understanding, enabling robust generalization and long-horizon reasoning at a more efficient model scale.
๐ Integrated Multimodal Architecture
Designed as a unified end-to-end pipeline for Understanding, Generating, and Predicting the world. Leveraging physical laws and causal CoT, the model doesn't just "see" but "understands" the underlying logic of environments. This allows for precise decomposition of complex tasks, seamless planning, and reliable execution in a single intelligence loop.
โก Linear-Time Attention for World Models
Introducing the first Hybrid Linear Attention operator specifically for world models. By reducing temporal complexity from $O(n^2)$ to $O(n)$, Kairos 3.0 slashes VRAM and compute overhead while maintaining long-sequence capabilities. This enables the industryโs first real-time on-robot inference for an open-source world model.
โจ 3. Demos
| Physicalโcausal consistency | Cross-embodiment generalization | High-efficiency inference |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
๐ง Physicalโcausal consistency
Kairos leverages causal CoT and physical laws to transform multimodal inputs into deep task logic. It enables autonomous planning and feasibility analysis, shifting the system from "executing commands" to "understanding intent" for real-world robotic actions.
๐จ Cross-embodiment generalization
Unified Cross-Embodied Generation: A single "brain" that generalizes across single-arm, dual-arm, and dexterous-hand platforms. Kairos enables shared, transferable world knowledge with maximal adaptability. Broad Hardware Support: Native compatibility with Agibot G1, Unitree G1, and Songling PIPER, significantly slashing development costs through zero-shot multi-task generalization.
๐ฎ High-efficiency inference
Real-time Edge Performance: Industry-leading inference speed with ultra-low resource consumption. Optimized for low-latency, high-reliability deployment across single or multi-GPU embodied systems.
๐ฆ 4. Model Zoo
| Download Links | Model Version | Use cases | Highlights |
|---|---|---|---|
| ๐คHuggingFace ๐คModelScope | kairos-4B-480P | 480p general pretrained model | 480p pretrained model for downstream fine-tuning. |
| ๐คHuggingFace ๐คModelScope | kairos-4B-robot-480P | Robot manipulation & real-world closed-loop control | Specialized for embodied AI; leading accuracy on PAI-Bench |
| ๐คHuggingFace ๐คModelScope | kairos-4B-robot-480P-distillation | On-robot IntegrationใEdge ComputingใLow-power Efficiency | Ultra-lightweight via distillation; enables real-time inference on embedded/edge devices. |
| ๐คHuggingFace ๐คModelScope | kairos-4B-720p | HD visual generation & complex physical reasoning | Supports 720P HD output with enhanced fine-grained detail capture. |
๐5. Evaluation
๐ฏ 5.1 Accuracy Benchmarks
| Domain | Benchmarks | Kairos-Robot | Cosmos 2.5-2B* | Wan 2.2-5B* | Cosmos 2.5-14B* | Lingbot* |
|---|---|---|---|---|---|---|
| Robot | PAI-Bench-robot | 80.03 | 78.3 | 78.6 | 79.4 | 79.96 |
| WorldModelBench-robot TI2V | 9.08 | 9.04 | 8.52 | 8.94 | 9.04 | |
| DreamGen Bench(PA/IF) | 0.529/0.609 | 0.418/0.568 | 0.314/0.543 | 0.495/0.478 | 0.466/0.569 |
| Domain | Benchmarks | Kairos 3.0-4B | Cosmos 2.5-2B* | Wan 2.2-5B* | Cosmos 2.5-14B |
|---|---|---|---|---|---|
| General | PAI-Bench | 80.84 | 81.0 | 80.4 | 81.0 |
| WorldModelBench | 8.94 | 8.86 | 8.70 | 9.02* | |
| VideoPHY | 45.55 | 44.64 | 38.85 | - |
*๏ผresults reproduced from open-source model baselines, "robot" refers to the corresponding results of the robot subset.๏ผ
Kairos models deliver SOTA performance across diverse benchmarks. In embodied scenarios, Kairos-Robot leads PAI-Bench with a score of 80.03 and dominates generalization tasks in DreamGen Bench. For general world modeling, Kairos 3.0-4B matches or exceeds larger-scale models on WorldModelBench and VideoPHY, achieving a perfect balance of precision and efficiency at a compact 4B scale.
โก 5.2 Deployment
5.2.1 Real-time Inference
| GPU | Resulotion | Memory(GB) | 1 GPU (s) | 4 GPUs (s) |
|---|---|---|---|---|
| NV-A800 | 480P | 23.5 | 11.7 | 3.0 |
| NV-RTX5090 | 480P | 13.9 | 11.4 | 5.7 |
*๏ผresults based on kairos-4B-robot 480p distillation๏ผ
5.2.2 Benchmark for A800 GPU
| Model | Parameter | Memory (GB) | Complexity (PFlops) | 1 GPU (s) | 4 GPUs (s) |
|---|---|---|---|---|---|
| Kairos 3.0 | 4B | 23.5 | 2.3 | 43.3 | 9.5 |
| Cosmos 2.5 | 14B | 70.2 | 156.5 (~70x) | 2526.0 | 687.2 |
| Wan 2.2 | 5B | 23.4 | 16.6 (~7x) | 201.0 | 85.0 |
| Lingbot | 28B | 46.1 | 347.4 (~160x) | 5525.0 | 1436.0 |
*๏ผevaluation setting๏ผTI2V mode with 720P/5s๏ผ
๐ง 6. Quick Start
6.1 Environment Installation
# Clone the repository
git clone https://github.com/kairos-agi/kairos-sensenova.git
cd kairos-sensenova
# You can set up the environment in two ways:
# 1) Build container from the Docker image
# 2) Build the environment from requirements with conda or venv
# 1) Docker image:
# Pull the Docker image
echo ghp_xxxxxxxxxxxxxxxxx | docker login ghcr.io -u username --password-stdin
docker pull ghcr.io/kairos-agi/kairos-sensenova:v0.0.1
# Create a container using Docker
docker run --rm -it \
--gpus all \
-v $(pwd):/workspace \
ghcr.io/kairos-agi/kairos-sensenova:v0.0.1 \
bash
# 2) requirments
# build a python environment with python>=3.10 && torch>=2.6 && cuda>=12.6
# install requirements
pip install -r requirements.txt
6.2 Download Models
- Download with huggingface
pip install -U huggingface_hub
# Download kairos model
# 4B-480P
hf download kairos-agi/kairos-sensenova-4B-480P-pretrained \
--local-dir models/Kairos-model/kairos-sensenova-4B-480P-pretrained
# 4B-720P
hf download kairos-agi/kairos-sensenova-4B-720P \
--local-dir models/Kairos-model/kairos-sensenova-4B-720P
- Download with modelscope
pip install modelscope
# Download kairos model
# 4B-480P
modelscope download kairos-team/kairos-sensenova-4B-480P-pretrained \
--local_dir models/Kairos-model/kairos-sensenova-4B-480P-pretrained
# 4B-720P
modelscope download kairos-team/kairos-sensenova-4B-720P \
--local_dir models/Kairos-model/kairos-sensenova-4B-720P
6.3 Run Inference
# Step1: Fetch the Model
mkdir -p models/Qwen models/Wan2.1-T2V-14B
# Download Qwen2.5-VL for Text-Encoder
hf download Qwen/Qwen2.5-VL-7B-Instruct-AWQ \
--local-dir models/Qwen/Qwen2.5-VL-7B-Instruct-AWQ \
# Dowload Wan2.1-VAE for VAE-Encoder/Decoder
hf download Wan-AI/Wan2.1-T2V-14B \
--local-dir models/Wan2.1-T2V-14B \
--include "Wan2.1_VAE.pth"
# Step2: Run the examples
# Text2Video
bash examples/inference.sh examples/example_t2v.json
# Text&FirstImage2Video
bash examples/inference.sh examples/example_ti2v.json
# FirstImage2Video
bash examples/inference.sh examples/example_i2v.json
๐ฅ 7. About Us
Developed and maintained by the Kairos Team. We specialize in Embodied Intelligence and World Model research, with a mission to build Artificial General Intelligence (AGI) that truly understands the physical world. Our goal is to accelerate the industrialization of embodied technologies and reshape the global landscape of AI competition.
๐ 8. License
Kairos is open-sourced under the Apache License 2.0. Feel free to use, modify, and build commercial products on top of it. Check the LICENSE file for the full text.
9. Acknowledgements
We would like to thank the contributors to Qwen-Image, Wan2.1, DiffSynth-Studio and HuggingFace for their open-source research contributions.
โญ Star us on GitHub if you find Kairos 3.0 helpful!
- Downloads last month
- 240









