---
license: apache-2.0
base_model:
- Wan-AI/Wan2.1-T2V-14B
pipeline_tag: text-to-video
tags:
- diffusion-single-file
- text-to-video
- video-to-video
- realtime
library_name: diffusers
---
Krea Realtime 14B is distilled from the [Wan 2.1 14B text-to-video model](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) using Self-Forcing, a technique for converting regular video diffusion models into autoregressive models. It achieves a text-to-video inference speed of **11fps** using 4 inference steps on a single NVIDIA B200 GPU. For more details on our training methodology and sampling innovations, refer to our [technical blog post](https://www.krea.ai/blog/krea-realtime-14b).
Inference code can be found [here](https://github.com/krea-ai/realtime-video).
- Our model is over **10x larger than existing realtime video models**
- We introduce **novel techniques for mitigating error accumulation,** including **KV Cache Recomputation** and **KV Cache Attention Bias**
- We develop **memory optimizations specific to autoregressive video diffusion models** that facilitate training large autoregressive models
- **Our model enables realtime interactive capabilities**: Users can modify prompts mid-generation, restyle videos on-the-fly, and see first frames within 1 second
# Video To Video
Krea realtime allows users to stream real videos, webcam inputs, or canvas primitives into the model, unlocking controllable video synthesis and editing
# Text To Video
Krea realtime allows users to generate videos in a streaming fashion with ~1s time to first frame.
# Use it with our inference code
Set up
```bash
sudo apt install ffmpeg # install if you haven't already
git clone https://github.com/krea-ai/realtime-video
cd realtime-video
uv sync
uv pip install flash_attn --no-build-isolation
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models/Wan2.1-T2V-1.3B
huggingface-cli download krea/krea-realtime-video krea-realtime-video-14b.safetensors --local-dir-use-symlinks False --local-dir checkpoints/krea-realtime-video-14b.safetensors
```
Run
```bash
export MODEL_FOLDER=Wan-AI
export CUDA_VISIBLE_DEVICES=0 # pick the GPU you want to serve on
export DO_COMPILE=true
uvicorn release_server:app --host 0.0.0.0 --port 8000
```
And use the web app at http://localhost:8000/ in your browser
(for more advanced use-cases and custom pipeline check out our GitHub repository: https://github.com/krea-ai/realtime-video)
# Use it with 🧨 diffusers
Krea Realtime 14B can be used with the `diffusers` library utilizing the new Modular Diffusers structure (for now supporting text-to-video, video-to-video coming soon)
```bash
# Install diffusers from main
pip install git+github.com/huggingface/diffusers.git
```
```py
import torch
from collections import deque
from diffusers.utils import export_to_video
from diffusers import ModularPipelineBlocks
from diffusers.modular_pipelines import PipelineState, WanModularPipeline
repo_id = "krea/krea-realtime-video"
blocks = ModularPipelineBlocks.from_pretrained(repo_id, trust_remote_code=True)
pipe = WanModularPipeline(blocks, repo_id)
pipe.load_components(
trust_remote_code=True,
device_map="cuda",
torch_dtype={"default": torch.bfloat16, "vae": torch.float16},
)
num_frames_per_block = 3
num_blocks = 9
frames = []
state = PipelineState()
state.set("frame_cache_context", deque(maxlen=pipe.config.frame_cache_len))
prompt = ["a cat sitting on a boat"]
for block in pipe.transformer.blocks:
block.self_attn.fuse_projections()
for block_idx in range(num_blocks):
state = pipe(
state,
prompt=prompt,
num_inference_steps=6,
num_blocks=num_blocks,
num_frames_per_block=num_frames_per_block,
block_idx=block_idx,
generator=torch.Generator("cuda").manual_seed(42),
)
frames.extend(state.values["videos"][0])
export_to_video(frames, "output.mp4", fps=16)
```