Create README.md
Browse files
    	
        README.md
    ADDED
    
    | @@ -0,0 +1,160 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            license: apache-2.0
         | 
| 3 | 
            +
            base_model:
         | 
| 4 | 
            +
            - coqui/XTTS-v2
         | 
| 5 | 
            +
            ---
         | 
| 6 | 
            +
            # Auralis π
         | 
| 7 | 
            +
             | 
| 8 | 
            +
            ## Model Details π οΈ
         | 
| 9 | 
            +
             | 
| 10 | 
            +
            **Model Name:** Auralis  
         | 
| 11 | 
            +
             | 
| 12 | 
            +
            **Model Architecture:** Based on [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) 
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            **License:**  
         | 
| 15 | 
            +
            - license: Apache 2.0  
         | 
| 16 | 
            +
            - base_model: XTTS-v2 Components [Coqui AI License](https://coqui.ai/cpml)
         | 
| 17 | 
            +
             | 
| 18 | 
            +
            **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi
         | 
| 19 | 
            +
              
         | 
| 20 | 
            +
            **Developed by:** [AstraMind.ai](https://www.astramind.ai)
         | 
| 21 | 
            +
              
         | 
| 22 | 
            +
            **GitHub:** [AstraMind AI](https://github.com/astramind-ai/Auralis/tree/main)
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks.  
         | 
| 25 | 
            +
             | 
| 26 | 
            +
            ---
         | 
| 27 | 
            +
             | 
| 28 | 
            +
            ## Model Description π
         | 
| 29 | 
            +
             | 
| 30 | 
            +
            Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling.
         | 
| 31 | 
            +
             | 
| 32 | 
            +
            ### Key Features:
         | 
| 33 | 
            +
            - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes.  
         | 
| 34 | 
            +
            - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090.  
         | 
| 35 | 
            +
            - **Scalable:** Handles multiple requests simultaneously.  
         | 
| 36 | 
            +
            - **Streaming:** Seamlessly processes long texts in a streaming format.  
         | 
| 37 | 
            +
            - **Custom Voices:** Enables voice cloning from short reference audio.  
         | 
| 38 | 
            +
             | 
| 39 | 
            +
            ---
         | 
| 40 | 
            +
             | 
| 41 | 
            +
            ## Quick Start β
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            ```python
         | 
| 44 | 
            +
            from auralis import TTS, TTSRequest
         | 
| 45 | 
            +
             | 
| 46 | 
            +
            # Initialize the model
         | 
| 47 | 
            +
            tts = TTS().from_pretrained("AstraMindAI/xtts2-gpt")
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            # Create a TTS request
         | 
| 50 | 
            +
            request = TTSRequest(
         | 
| 51 | 
            +
                text="Hello Earth! This is Auralis speaking.",
         | 
| 52 | 
            +
                speaker_files=["reference.wav"]
         | 
| 53 | 
            +
            )
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            # Generate speech
         | 
| 56 | 
            +
            output = tts.generate_speech(request)
         | 
| 57 | 
            +
            output.save("output.wav")
         | 
| 58 | 
            +
            ```
         | 
| 59 | 
            +
             | 
| 60 | 
            +
            ---
         | 
| 61 | 
            +
             | 
| 62 | 
            +
            ## Ebook Generation π
         | 
| 63 | 
            +
             | 
| 64 | 
            +
            Auralis converting ebooks into audio formats at lightning speed. For Python script, check out [ebook_audio_generator.py](https://github.com/astramind-ai/Auralis/blob/main/examples/vocalize_a_ebook.py).
         | 
| 65 | 
            +
             | 
| 66 | 
            +
            ```python
         | 
| 67 | 
            +
            def process_book(chapter_file: str, speaker_file: str):
         | 
| 68 | 
            +
                # Read chapter
         | 
| 69 | 
            +
                with open(chapter_file, 'r') as f:
         | 
| 70 | 
            +
                    chapter = f.read()
         | 
| 71 | 
            +
                
         | 
| 72 | 
            +
                # You can pass the whole book, auralis will take care of splitting
         | 
| 73 | 
            +
                
         | 
| 74 | 
            +
                request = TTSRequest(
         | 
| 75 | 
            +
                        text=chapter,
         | 
| 76 | 
            +
                        speaker_files=[speaker_file],
         | 
| 77 | 
            +
                        audio_config=AudioPreprocessingConfig(
         | 
| 78 | 
            +
                            enhance_speech=True,
         | 
| 79 | 
            +
                            normalize=True
         | 
| 80 | 
            +
                        )
         | 
| 81 | 
            +
                    )
         | 
| 82 | 
            +
                    
         | 
| 83 | 
            +
                output = tts.generate_speech(request)
         | 
| 84 | 
            +
                
         | 
| 85 | 
            +
                output.play()
         | 
| 86 | 
            +
                output.save("chapter_output.wav")
         | 
| 87 | 
            +
             | 
| 88 | 
            +
            # Example usage
         | 
| 89 | 
            +
            process_book("chapter1.txt", "reference_voice.wav")
         | 
| 90 | 
            +
            ```
         | 
| 91 | 
            +
             | 
| 92 | 
            +
            ---
         | 
| 93 | 
            +
             | 
| 94 | 
            +
            ## Intended Use π
         | 
| 95 | 
            +
             | 
| 96 | 
            +
            Auralis is designed for:
         | 
| 97 | 
            +
            - **Content Creators:** Generate audiobooks, podcasts, or voiceovers.  
         | 
| 98 | 
            +
            - **Developers:** Integrate TTS into applications via a simple Python API.  
         | 
| 99 | 
            +
            - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. 
         | 
| 100 | 
            +
            - **Multilingual Scenarios:** Convert text to speech in multiple supported languages.  
         | 
| 101 | 
            +
             | 
| 102 | 
            +
            ---
         | 
| 103 | 
            +
             | 
| 104 | 
            +
            ## Performance π
         | 
| 105 | 
            +
             | 
| 106 | 
            +
            **Benchmarks on NVIDIA RTX 3090:**  
         | 
| 107 | 
            +
            - Short phrases (<100 characters): ~1 second  
         | 
| 108 | 
            +
            - Medium texts (<1,000 characters): ~5-10 seconds  
         | 
| 109 | 
            +
            - Full books (~100,000 characters): ~10 minutes  
         | 
| 110 | 
            +
             | 
| 111 | 
            +
            **Memory Usage:**  
         | 
| 112 | 
            +
            - Base VRAM: ~4GB  
         | 
| 113 | 
            +
            - Peak VRAM: ~10GB  
         | 
| 114 | 
            +
             | 
| 115 | 
            +
            ---
         | 
| 116 | 
            +
             | 
| 117 | 
            +
            ## Model Features πΈ
         | 
| 118 | 
            +
             | 
| 119 | 
            +
            1. **Speed & Efficiency:**  
         | 
| 120 | 
            +
               - Smart batching for rapid processing of long texts.  
         | 
| 121 | 
            +
               - Memory-optimized for consumer GPUs.  
         | 
| 122 | 
            +
             | 
| 123 | 
            +
            2. **Easy Integration:**  
         | 
| 124 | 
            +
               - Python API with support for synchronous and asynchronous workflows.  
         | 
| 125 | 
            +
               - Streaming mode for continuous playback during generation.  
         | 
| 126 | 
            +
             | 
| 127 | 
            +
            3. **Audio Quality Enhancements:**  
         | 
| 128 | 
            +
               - Background noise reduction.  
         | 
| 129 | 
            +
               - Voice clarity and volume normalization.  
         | 
| 130 | 
            +
               - Customizable audio preprocessing.  
         | 
| 131 | 
            +
             | 
| 132 | 
            +
            4. **Multilingual Support:**  
         | 
| 133 | 
            +
               - Automatic language detection.  
         | 
| 134 | 
            +
               - High-quality speech in 15+ languages.  
         | 
| 135 | 
            +
             | 
| 136 | 
            +
            5. **Customization:**  
         | 
| 137 | 
            +
               - Voice cloning using short reference clips.  
         | 
| 138 | 
            +
               - Adjustable parameters for tone, pacing, and language.  
         | 
| 139 | 
            +
             | 
| 140 | 
            +
            ---
         | 
| 141 | 
            +
             | 
| 142 | 
            +
            ## Limitations & Ethical Considerations β οΈ
         | 
| 143 | 
            +
             | 
| 144 | 
            +
            - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent.  
         | 
| 145 | 
            +
            - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input.  
         | 
| 146 | 
            +
             | 
| 147 | 
            +
            ---
         | 
| 148 | 
            +
             | 
| 149 | 
            +
            ## Citation π
         | 
| 150 | 
            +
             | 
| 151 | 
            +
            If you use Auralis in your research or projects, please cite:
         | 
| 152 | 
            +
             | 
| 153 | 
            +
            ```bibtex
         | 
| 154 | 
            +
            @misc{auralis2024,
         | 
| 155 | 
            +
              author = {AstraMind AI},
         | 
| 156 | 
            +
              title = {Auralis: High-Performance Text-to-Speech Engine},
         | 
| 157 | 
            +
              year = {2024},
         | 
| 158 | 
            +
              url = {https://huggingface.co/AstraMindAI/auralis}
         | 
| 159 | 
            +
            }
         | 
| 160 | 
            +
            ```
         | 

