SURESHBEEKHANI's picture
Update README.md
157c509 verified
metadata
license: mit
tags:
  - unsloth
  - trl
  - sft
language:
  - en
base_model:
  - SparkAudio/Spark-TTS-0.5B
pipeline_tag: text-to-speech

Spark-TTS 0.5B Fine-Tuned Model (16-bit Merged)

This repository hosts a fine-tuned Spark-TTS 0.5B model optimized for speech synthesis using the Unsloth and TRL libraries. The model is saved and shared in a merged 16-bit format for efficient storage and faster inference while maintaining high-quality outputs.

Model Details

  • Architecture: Transformer-based Text-to-Speech (Spark-TTS)
  • Model Size: 0.5 Billion parameters
  • Precision: 16-bit merged weights (optimized for inference)
  • Fine-tuning: Full fine-tuning enabled with LoRA adapters (bfloat16 precision)
  • Training Framework: Unsloth & TRL (Supervised Fine-Tuning)
  • Tokenizer: Compatible tokenizer included

Intended Use

This model is intended for research and development in text-to-speech synthesis tasks, especially where GPU memory efficiency and long context handling are priorities.

Usage

from unsloth import FastModel
import torch

# Load the fine-tuned Spark-TTS model and tokenizer from Hugging Face Hub
model, tokenizer = FastModel.from_pretrained(
    "sureshbeekhani/spark-tts-0.5b-finetune-16bit",
    max_seq_length=2048,   # Adjust based on your needs
    dtype=torch.bfloat16,  # Use bfloat16 for LoRA compatibility and efficiency
    full_finetuning=False, # Set to False if you want to use the model for inference only
)

# Example text input for speech synthesis
text = "Hello, welcome to the Spark-TTS fine-tuned model demo!"

# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt")

# Generate speech output from the model
# Note: Adjust this to your model’s specific generate method if applicable
outputs = model.generate(**inputs)

# Process or save outputs as needed (e.g., convert to audio waveform)
# This part depends on your model’s output format and synthesis pipeline

print("Inference completed successfully.")

# Limitations
LoRA fine-tuning is supported only with bfloat16 precision.

Designed primarily for speech synthesis; may not perform well for unrelated NLP tasks.

Usage in production should be tested carefully for latency and quality trade-offs.

#License

This model is licensed under the MIT License.
If you want, I can help generate a README.md file or add badges and additional sections!