Add model card and metadata
#1
by
nielsr
HF Staff
- opened
README.md
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: diffusers
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
|
| 7 |
+
|
| 8 |
+
Stream-DiffVSR is a causally conditioned diffusion framework designed for efficient online Video Super-Resolution (VSR). It operates strictly on past frames to maintain low latency, making it suitable for real-time deployment.
|
| 9 |
+
|
| 10 |
+
[[Paper](https://huggingface.co/papers/2512.23709)] [[Project Page](https://jamichss.github.io/stream-diffvsr-project-page/)] [[GitHub](https://github.com/jamichss/Stream-DiffVSR)]
|
| 11 |
+
|
| 12 |
+
## Description
|
| 13 |
+
Diffusion-based VSR methods often struggle with latency due to multi-step denoising and reliance on future frames. Stream-DiffVSR addresses this with:
|
| 14 |
+
- **Causal Conditioning:** Operates only on past frames for online processing.
|
| 15 |
+
- **Four-step Distilled Denoiser:** Enables fast inference without sacrificing quality.
|
| 16 |
+
- **Auto-regressive Temporal Guidance (ARTG):** Injects motion-aligned cues during denoising.
|
| 17 |
+
- **Lightweight Temporal Decoder:** Enhances temporal coherence and fine details.
|
| 18 |
+
|
| 19 |
+
Stream-DiffVSR can process 720p frames in 0.328 seconds on an RTX 4090, achieving significant latency reductions compared to prior diffusion-based VSR methods.
|
| 20 |
+
|
| 21 |
+
## Usage
|
| 22 |
+
|
| 23 |
+
### Installation
|
| 24 |
+
```bash
|
| 25 |
+
git clone https://github.com/jamichss/Stream-DiffVSR.git
|
| 26 |
+
cd Stream-DiffVSR
|
| 27 |
+
conda env create -f requirements.yml
|
| 28 |
+
conda activate stream-diffvsr
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
### Inference
|
| 32 |
+
You can run inference using the following command. The script will automatically fetch the necessary weights from this repository.
|
| 33 |
+
|
| 34 |
+
```bash
|
| 35 |
+
python inference.py \
|
| 36 |
+
--model_id 'Jamichsu/Stream-DiffVSR' \
|
| 37 |
+
--out_path 'YOUR_OUTPUT_PATH' \
|
| 38 |
+
--in_path 'YOUR_INPUT_PATH' \
|
| 39 |
+
--num_inference_steps 4
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
The expected file structure for the inference input data is as follows:
|
| 43 |
+
```
|
| 44 |
+
YOUR_INPUT_PATH/
|
| 45 |
+
├── seq1/
|
| 46 |
+
│ ├── frame_0001.png
|
| 47 |
+
│ ├── frame_0002.png
|
| 48 |
+
│ └── ...
|
| 49 |
+
├── seq2/
|
| 50 |
+
│ ├── frame_0001.png
|
| 51 |
+
│ ├── frame_0002.png
|
| 52 |
+
│ └── ...
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
For NVIDIA TensorRT acceleration:
|
| 56 |
+
```bash
|
| 57 |
+
python inference.py \
|
| 58 |
+
--model_id 'Jamichsu/Stream-DiffVSR' \
|
| 59 |
+
--out_path 'YOUR_OUTPUT_PATH' \
|
| 60 |
+
--in_path 'YOUR_INPUT_PATH' \
|
| 61 |
+
--num_inference_steps 4 \
|
| 62 |
+
--enable_tensorrt \
|
| 63 |
+
--image_height <YOUR_TARGET_HEIGHT> \
|
| 64 |
+
--image_width <YOUR_TARGET_WIDTH>
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
## Citation
|
| 68 |
+
If you find this work useful, please cite:
|
| 69 |
+
```bibtex
|
| 70 |
+
@article{shiu2025streamdiffvsr,
|
| 71 |
+
title={Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion},
|
| 72 |
+
author={Shiu, Hau-Shiang and Lin, Chin-Yang and Wang, Zhixiang and Hsiao, Chi-Wei and Yu, Po-Fan and Chen, Yu-Chih and Liu, Yu-Lun},
|
| 73 |
+
journal={arXiv preprint arXiv:2512.23709},
|
| 74 |
+
year={2025}
|
| 75 |
+
}
|
| 76 |
+
```
|