add model ckpt

Browse files

Files changed (6) hide show

.gitattributes +2 -0
README.md +78 -0
config.json +14 -0
figs/model.png +3 -0
figs/[email protected] +3 -0
model_200ms.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.jpeg filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,81 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+  - en
+  - zh
+tags:
+  - text-to-speech
 ---
+# MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
+<div align="center">
+[![Paper](https://img.shields.io/badge/arXiv-2510.08392-b31b1b.svg)](https://arxiv.org/pdf/2510.08392)
+[![Github](https://img.shields.io/badge/Github-Page-green)](https://github.com/ASLP-lab/MeanVC)
+[![Demo Page](https://img.shields.io/badge/Demo-Audio%20Samples-green)](https://aslp-lab.github.io/MeanVC/)
+</div>
+**MeanVC** is a lightweight and streaming zero-shot voice conversion system that enables real-time timbre transfer from any source speaker to any target speaker while preserving linguistic content. The system introduces a diffusion transformer with chunk-wise autoregressive denoising strategy and mean flows for efficient single-step inference.
+![img](figs/model.png)
+## ✨ Key Features
+-   **🚀 Streaming Inference**: Real-time voice conversion with chunk-wise processing.
+-   **⚡ Single-Step Generation**: Direct mapping from start to endpoint via mean flows for fast generation.
+-   **🎯 Zero-Shot Capability**: Convert to any unseen target speaker without re-training.
+-   **💾 Lightweight**: Significantly fewer parameters than existing methods.
+-   **🔊 High Fidelity**: Superior speech quality and speaker similarity.
+## 💾 Model Download
+Use the following Python script to download the models into a local directory (e.g., ./checkpoints):
+```python
+from huggingface_hub import snapshot_download
+# Download all necessary models and components for MeanVC
+snapshot_download(
+    "ASLP-lab/MeanVC",
+    allow_patterns=[
+        "model_200ms.safetensors", # The trained MeanVC model weights
+        "meanvc_200ms.pt",       # JIT-compiled model for real-time inference
+        "fastu2++.pt",           # JIT-compiled ASR model
+        "vocos.pt"               # JIT-compiled Vocos vocoder
+    ],
+    local_dir="./checkpoints", # Specify your target directory
+    local_dir_use_symlinks=False
+)
+```
+## 📜 License & Disclaimer
+MeanVC is released under the Apache License 2.0. This open-source license allows you to freely use, modify, and distribute the model, as long as you include the appropriate copyright notice and disclaimer.
+MeanVC is designed for research and legitimate applications in voice conversion technology. Users must obtain proper consent from individuals whose voices are being converted or used as references. We strongly discourage any malicious use including impersonation, fraud, or creating misleading audio content. Users are solely responsible for ensuring their use cases comply with ethical standards and legal requirements.
+## 📄 Citation
+If you find our work helpful, please cite our paper:
+```bibtex
+@article{ma2025meanvc,
+  title={MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows},
+  author={Ma, Guobin and Yao, Jixun and Ning, Ziqian and Jiang, Yuepeng and Xiong, Lingxin and Xie, Lei and Zhu, Pengcheng},
+  journal={arXiv preprint arXiv:2510.08392},
+  year={2025}
+}
+```
+## 📧 Contact
+If you are interested in leaving a message to our research team, feel free to email [email protected]
+<p align="center">
+    <img src="figs/[email protected]" width="500"/>
+</p>

config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+    "model_type": "DiT",
+    "model": {
+        "dim": 512,
+        "depth": 4,
+        "heads": 2,
+        "ff_mult": 2,
+        "bn_dim": 256,
+        "conv_layers": 4,
+        "chunk_size": 20,
+        "dropout": 0.0,
+        "qk_norm": "rms_norm"
+    }
+}

figs/model.png ADDED Viewed

Git LFS Details

SHA256: 0514278f969b291eceab1c2fa4c4171008ba47688798e9ecaa4c6d3cb9c2b826
Pointer size: 131 Bytes
Size of remote file: 292 kB

figs/[email protected] ADDED Viewed

Git LFS Details

SHA256: 41eae6df7b8458e13ffd2de14876f95cd3f7fad91f8527a751a7dd63347c6a71
Pointer size: 132 Bytes
Size of remote file: 1.56 MB

model_200ms.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5c2d9ed6c8c149d4fdf9ba6f17ebbc675784010585344448136261c874decb0f
+size 56271424