Harshil748 commited on
Commit
72ce17a
·
verified ·
1 Parent(s): 9e5ede4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +161 -22
README.md CHANGED
@@ -1,10 +1,17 @@
1
  ---
 
 
 
 
2
  license: mit
 
3
  tags:
4
  - tts
5
  - text-to-speech
6
  - indian-languages
7
  - vits
 
 
8
  language:
9
  - hi
10
  - bn
@@ -19,32 +26,164 @@ language:
19
  - gu
20
  ---
21
 
22
- # VoiceAPI Models
23
 
24
- TTS models for 11 Indian languages, 21 voices total.
25
 
26
- ## Languages & Voices
27
 
28
- | Language | Code | Female | Male |
29
- |----------|------|--------|------|
30
- | Hindi | hi | ✅ | ✅ |
31
- | Bengali | bn | ✅ | ✅ |
32
- | Marathi | mr | ✅ | ✅ |
33
- | Telugu | te | ✅ | ✅ |
34
- | Kannada | kn | ✅ | ✅ |
35
- | English | en | ✅ | ✅ |
36
- | Bhojpuri | bho | ✅ | ✅ |
37
- | Maithili | mai | ✅ | ✅ |
38
- | Magahi | mag | ✅ | ✅ |
39
- | Chhattisgarhi | hne | ✅ | ✅ |
40
- | Gujarati | gu | MMS | - |
41
 
42
- ## Model Types
 
 
 
 
43
 
44
- - **JIT Models** (.pt): SYSPIN VITS models (most languages)
45
- - **Coqui Models** (.pth): Bhojpuri male/female
46
- - **MMS**: Facebook MMS for Gujarati
47
 
48
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
- These models are used by the [VoiceAPI](https://huggingface.co/spaces/Harshil748/VoiceAPI) TTS service.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ colorFrom: blue
3
+ colorTo: purple
4
+ sdk: docker
5
+ app_port: 7860
6
  license: mit
7
+ title: VoiceAPI
8
  tags:
9
  - tts
10
  - text-to-speech
11
  - indian-languages
12
  - vits
13
+ - multilingual
14
+ - speech-synthesis
15
  language:
16
  - hi
17
  - bn
 
26
  - gu
27
  ---
28
 
29
+ # 🎙️ VoiceAPI - Multi-lingual Indian Language TTS
30
 
31
+ An advanced **multi-speaker, multilingual text-to-speech (TTS) synthesizer** supporting 11 Indian languages with 21 voice options.
32
 
33
+ **Live API**: [https://harshil748-voiceapi.hf.space](https://harshil748-voiceapi.hf.space)
34
 
35
+ ## 🌟 Features
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
+ - **11 Indian Languages**: Hindi, Bengali, Marathi, Telugu, Kannada, Gujarati, Bhojpuri, Chhattisgarhi, Maithili, Magahi, English
38
+ - **21 Voice Options**: Male and female voices for each language
39
+ - **High-Quality Audio**: 22050 Hz sample rate, natural prosody
40
+ - **REST API**: Simple GET/POST endpoints for easy integration
41
+ - **Real-time Synthesis**: Fast inference on CPU/GPU
42
 
43
+ ## 🗣️ Supported Languages
 
 
44
 
45
+ | Language | Code | Female | Male | Script |
46
+ |----------|------|--------|------|--------|
47
+ | Hindi | hi | ✅ | ✅ | देवनागरी |
48
+ | Bengali | bn | ✅ | ✅ | বাংলা |
49
+ | Marathi | mr | ✅ | ✅ | देवनागरी |
50
+ | Telugu | te | ✅ | ✅ | తెలుగు |
51
+ | Kannada | kn | ✅ | ✅ | ಕನ್ನಡ |
52
+ | Gujarati | gu | ✅ (MMS) | - | ગુજરાતી |
53
+ | Bhojpuri | bho | ✅ | ✅ | देवनागरी |
54
+ | Chhattisgarhi | hne | ✅ | ✅ | देवनागरी |
55
+ | Maithili | mai | ✅ | ✅ | देवनागरी |
56
+ | Magahi | mag | ✅ | ✅ | देवनागरी |
57
+ | English | en | ✅ | ✅ | Latin |
58
 
59
+ ## 📡 API Usage
60
+
61
+ ### Endpoint
62
+
63
+ \`\`\`
64
+ GET/POST /Get_Inference
65
+ \`\`\`
66
+
67
+ ### Parameters
68
+
69
+ | Parameter | Type | Required | Description |
70
+ |-----------|------|----------|-------------|
71
+ | \`text\` | string | Yes | Text to synthesize (lowercase for English) |
72
+ | \`lang\` | string | Yes | Language name (hindi, bengali, etc.) |
73
+ | \`speaker_wav\` | file | Yes | Reference WAV file (for API compatibility) |
74
+
75
+ ### Example (Python)
76
+
77
+ \`\`\`python
78
+ import requests
79
+
80
+ base_url = 'https://harshil748-voiceapi.hf.space/Get_Inference'
81
+ WavPath = 'reference.wav'
82
+
83
+ params = {
84
+ 'text': 'नमस्ते, आप कैसे हैं?',
85
+ 'lang': 'hindi',
86
+ }
87
+
88
+ with open(WavPath, "rb") as AudioFile:
89
+ response = requests.get(base_url, params=params, files={'speaker_wav': AudioFile.read()})
90
+
91
+ if response.status_code == 200:
92
+ with open('output.wav', 'wb') as f:
93
+ f.write(response.content)
94
+ print("Audio saved as 'output.wav'")
95
+ \`\`\`
96
+
97
+ ### Example (cURL)
98
+
99
+ \`\`\`bash
100
+ curl -X POST "https://harshil748-voiceapi.hf.space/Get_Inference?text=hello&lang=english" \\
101
+ -F "[email protected]" \\
102
+ -o output.wav
103
+ \`\`\`
104
+
105
+ ## 🏗️ Model Architecture
106
+
107
+ - **Base Model**: VITS (Variational Inference with adversarial learning for Text-to-Speech)
108
+ - **Encoder**: Transformer-based text encoder (6 layers, 192 hidden channels)
109
+ - **Decoder**: HiFi-GAN neural vocoder
110
+ - **Duration Predictor**: Stochastic duration predictor for natural prosody
111
+ - **Sample Rate**: 22050 Hz (16000 Hz for Gujarati MMS)
112
+
113
+ ## 📊 Training
114
+
115
+ ### Datasets Used
116
+
117
+ | Dataset | Languages | Source | License |
118
+ |---------|-----------|--------|---------|
119
+ | OpenSLR-103 | Hindi | [OpenSLR](https://www.openslr.org/103/) | CC BY 4.0 |
120
+ | OpenSLR-37 | Bengali | [OpenSLR](https://www.openslr.org/37/) | CC BY 4.0 |
121
+ | OpenSLR-64 | Marathi | [OpenSLR](https://www.openslr.org/64/) | CC BY 4.0 |
122
+ | OpenSLR-66 | Telugu | [OpenSLR](https://www.openslr.org/66/) | CC BY 4.0 |
123
+ | OpenSLR-79 | Kannada | [OpenSLR](https://www.openslr.org/79/) | CC BY 4.0 |
124
+ | OpenSLR-78 | Gujarati | [OpenSLR](https://www.openslr.org/78/) | CC BY 4.0 |
125
+ | Common Voice | Hindi, Bengali | [Mozilla](https://commonvoice.mozilla.org/) | CC0 |
126
+ | IndicTTS | Multiple | [IIT Madras](https://www.iitm.ac.in/donlab/tts/) | Research |
127
+ | Indic-Voices | Multiple | [AI4Bharat](https://ai4bharat.iitm.ac.in/indic-voices/) | CC BY 4.0 |
128
+
129
+ ### Training Configuration
130
+
131
+ - **Epochs**: 1000
132
+ - **Batch Size**: 32
133
+ - **Learning Rate**: 2e-4
134
+ - **Optimizer**: AdamW
135
+ - **FP16 Training**: Enabled
136
+ - **Hardware**: NVIDIA V100/A100 GPUs
137
+
138
+ See \`training/\` directory for full training scripts and configurations.
139
+
140
+ ## 🚀 Deployment
141
+
142
+ This API is deployed on HuggingFace Spaces using Docker:
143
+
144
+ \`\`\`dockerfile
145
+ FROM python:3.10-slim
146
+ # ... installs dependencies
147
+ # Downloads models from Harshil748/VoiceAPI-Models
148
+ # Runs FastAPI server on port 7860
149
+ \`\`\`
150
+
151
+ Models are hosted separately at [Harshil748/VoiceAPI-Models](https://huggingface.co/Harshil748/VoiceAPI-Models) (~8GB).
152
+
153
+ ## 📁 Project Structure
154
+
155
+ \`\`\`
156
+ VoiceAPI/
157
+ ├── app.py # HuggingFace Spaces entry point
158
+ ├── Dockerfile # Docker configuration
159
+ ├── requirements.txt # Python dependencies
160
+ ├── download_models.py # Model downloader
161
+ ├── src/
162
+ │ ├── api.py # FastAPI REST server
163
+ │ ├── engine.py # TTS inference engine
164
+ │ ├── config.py # Voice configurations
165
+ │ └── tokenizer.py # Text tokenization
166
+ └── training/
167
+ ├── train_vits.py # VITS training script
168
+ ├── prepare_dataset.py # Data preparation
169
+ ├── export_model.py # Model export
170
+ ├── datasets.csv # Dataset links
171
+ └── configs/ # Training configs
172
+ \`\`\`
173
+
174
+ ## 📜 License
175
+
176
+ - **Code**: MIT License
177
+ - **Models**: CC BY 4.0 (following SYSPIN licensing)
178
+ - **Datasets**: Individual licenses (see training/datasets.csv)
179
+
180
+ ## 🙏 Acknowledgments
181
+
182
+ - [SYSPIN IISc SPIRE Lab](https://syspin.iisc.ac.in/) for pre-trained VITS models
183
+ - [Facebook MMS](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) for Gujarati TTS
184
+ - [Coqui TTS](https://github.com/coqui-ai/TTS) for the TTS library
185
+ - [AI4Bharat](https://ai4bharat.iitm.ac.in/) for Indian language resources
186
+
187
+ ## 📧 Contact
188
+
189
+ Built for the **Voice Tech for All** Hackathon - Multi-lingual TTS for healthcare assistants serving low-income communities.