--- license: apache-2.0 base_model: - Wan-AI/Wan2.1-T2V-14B pipeline_tag: text-to-video tags: - diffusion-single-file - text-to-video - video-to-video - realtime library_name: diffusers --- Krea Realtime 14B is distilled from the [Wan 2.1 14B text-to-video model](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) using Self-Forcing, a technique for converting regular video diffusion models into autoregressive models. It achieves a text-to-video inference speed of **11fps** using 4 inference steps on a single NVIDIA B200 GPU. For more details on our training methodology and sampling innovations, refer to our [technical blog post](https://www.krea.ai/blog/krea-realtime-14b). Inference code can be found [here](https://github.com/krea-ai/realtime-video).

- Our model is over **10x larger than existing realtime video models** - We introduce **novel techniques for mitigating error accumulation,** including **KV Cache Recomputation** and **KV Cache Attention Bias** - We develop **memory optimizations specific to autoregressive video diffusion models** that facilitate training large autoregressive models - **Our model enables realtime interactive capabilities**: Users can modify prompts mid-generation, restyle videos on-the-fly, and see first frames within 1 second # Video To Video Krea realtime allows users to stream real videos, webcam inputs, or canvas primitives into the model, unlocking controllable video synthesis and editing

# Text To Video Krea realtime allows users to generate videos in a streaming fashion with ~1s time to first frame.

# Use it with our inference code Set up ```bash sudo apt install ffmpeg # install if you haven't already git clone https://github.com/krea-ai/realtime-video cd realtime-video uv sync uv pip install flash_attn --no-build-isolation huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models/Wan2.1-T2V-1.3B huggingface-cli download krea/krea-realtime-video krea-realtime-video-14b.safetensors --local-dir-use-symlinks False --local-dir checkpoints/krea-realtime-video-14b.safetensors ``` Run ```bash export MODEL_FOLDER=Wan-AI export CUDA_VISIBLE_DEVICES=0 # pick the GPU you want to serve on export DO_COMPILE=true uvicorn release_server:app --host 0.0.0.0 --port 8000 ``` And use the web app at http://localhost:8000/ in your browser (for more advanced use-cases and custom pipeline check out our GitHub repository: https://github.com/krea-ai/realtime-video) # Use it with 🧨 diffusers Krea Realtime 14B can be used with the `diffusers` library utilizing the new Modular Diffusers structure (for now supporting text-to-video, video-to-video coming soon) ```bash # Install diffusers from main pip install git+github.com/huggingface/diffusers.git ``` ```py import torch from collections import deque from diffusers.utils import export_to_video from diffusers import ModularPipelineBlocks from diffusers.modular_pipelines import PipelineState, WanModularPipeline repo_id = "krea/krea-realtime-video" blocks = ModularPipelineBlocks.from_pretrained(repo_id, trust_remote_code=True) pipe = WanModularPipeline(blocks, repo_id) pipe.load_components( trust_remote_code=True, device_map="cuda", torch_dtype={"default": torch.bfloat16, "vae": torch.float16}, ) num_frames_per_block = 3 num_blocks = 9 frames = [] state = PipelineState() state.set("frame_cache_context", deque(maxlen=pipe.config.frame_cache_len)) prompt = ["a cat sitting on a boat"] for block in pipe.transformer.blocks: block.self_attn.fuse_projections() for block_idx in range(num_blocks): state = pipe( state, prompt=prompt, num_inference_steps=6, num_blocks=num_blocks, num_frames_per_block=num_frames_per_block, block_idx=block_idx, generator=torch.Generator("cuda").manual_seed(42), ) frames.extend(state.values["videos"][0]) export_to_video(frames, "output.mp4", fps=16) ```