Add model card and metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: diffusers
3
+ pipeline_tag: image-to-image
4
+ ---
5
+
6
+ # Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
7
+
8
+ Stream-DiffVSR is a causally conditioned diffusion framework designed for efficient online Video Super-Resolution (VSR). It operates strictly on past frames to maintain low latency, making it suitable for real-time deployment.
9
+
10
+ [[Paper](https://huggingface.co/papers/2512.23709)] [[Project Page](https://jamichss.github.io/stream-diffvsr-project-page/)] [[GitHub](https://github.com/jamichss/Stream-DiffVSR)]
11
+
12
+ ## Description
13
+ Diffusion-based VSR methods often struggle with latency due to multi-step denoising and reliance on future frames. Stream-DiffVSR addresses this with:
14
+ - **Causal Conditioning:** Operates only on past frames for online processing.
15
+ - **Four-step Distilled Denoiser:** Enables fast inference without sacrificing quality.
16
+ - **Auto-regressive Temporal Guidance (ARTG):** Injects motion-aligned cues during denoising.
17
+ - **Lightweight Temporal Decoder:** Enhances temporal coherence and fine details.
18
+
19
+ Stream-DiffVSR can process 720p frames in 0.328 seconds on an RTX 4090, achieving significant latency reductions compared to prior diffusion-based VSR methods.
20
+
21
+ ## Usage
22
+
23
+ ### Installation
24
+ ```bash
25
+ git clone https://github.com/jamichss/Stream-DiffVSR.git
26
+ cd Stream-DiffVSR
27
+ conda env create -f requirements.yml
28
+ conda activate stream-diffvsr
29
+ ```
30
+
31
+ ### Inference
32
+ You can run inference using the following command. The script will automatically fetch the necessary weights from this repository.
33
+
34
+ ```bash
35
+ python inference.py \
36
+ --model_id 'Jamichsu/Stream-DiffVSR' \
37
+ --out_path 'YOUR_OUTPUT_PATH' \
38
+ --in_path 'YOUR_INPUT_PATH' \
39
+ --num_inference_steps 4
40
+ ```
41
+
42
+ The expected file structure for the inference input data is as follows:
43
+ ```
44
+ YOUR_INPUT_PATH/
45
+ ├── seq1/
46
+ │ ├── frame_0001.png
47
+ │ ├── frame_0002.png
48
+ │ └── ...
49
+ ├── seq2/
50
+ │ ├── frame_0001.png
51
+ │ ├── frame_0002.png
52
+ │ └── ...
53
+ ```
54
+
55
+ For NVIDIA TensorRT acceleration:
56
+ ```bash
57
+ python inference.py \
58
+ --model_id 'Jamichsu/Stream-DiffVSR' \
59
+ --out_path 'YOUR_OUTPUT_PATH' \
60
+ --in_path 'YOUR_INPUT_PATH' \
61
+ --num_inference_steps 4 \
62
+ --enable_tensorrt \
63
+ --image_height <YOUR_TARGET_HEIGHT> \
64
+ --image_width <YOUR_TARGET_WIDTH>
65
+ ```
66
+
67
+ ## Citation
68
+ If you find this work useful, please cite:
69
+ ```bibtex
70
+ @article{shiu2025streamdiffvsr,
71
+ title={Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion},
72
+ author={Shiu, Hau-Shiang and Lin, Chin-Yang and Wang, Zhixiang and Hsiao, Chi-Wei and Yu, Po-Fan and Chen, Yu-Chih and Liu, Yu-Lun},
73
+ journal={arXiv preprint arXiv:2512.23709},
74
+ year={2025}
75
+ }
76
+ ```