SURESHBEEKHANI
/

spark-tts-0.5b-finetune-16bit

Model card Files Files and versions

spark-tts-0.5b-finetune-16bit / README.md

SURESHBEEKHANI's picture

Update README.md

157c509 verified 3 months ago

|

history blame contribute delete

2.4 kB

	---
	license: mit
	tags:
	- unsloth
	- trl
	- sft
	language:
	- en
	base_model:
	- SparkAudio/Spark-TTS-0.5B
	pipeline_tag: text-to-speech
	---

	# Spark-TTS 0.5B Fine-Tuned Model (16-bit Merged)

	This repository hosts a fine-tuned Spark-TTS 0.5B model optimized for speech synthesis using the Unsloth and TRL libraries. The model is saved and shared in a merged 16-bit format for efficient storage and faster inference while maintaining high-quality outputs.

	## Model Details

	- Architecture: Transformer-based Text-to-Speech (Spark-TTS)
	- Model Size: 0.5 Billion parameters
	- Precision: 16-bit merged weights (optimized for inference)
	- Fine-tuning: Full fine-tuning enabled with LoRA adapters (bfloat16 precision)
	- Training Framework: Unsloth & TRL (Supervised Fine-Tuning)
	- Tokenizer: Compatible tokenizer included

	## Intended Use

	This model is intended for research and development in text-to-speech synthesis tasks, especially where GPU memory efficiency and long context handling are priorities.

	## Usage

	```python
	from unsloth import FastModel
	import torch

	# Load the fine-tuned Spark-TTS model and tokenizer from Hugging Face Hub
	model, tokenizer = FastModel.from_pretrained(
	"sureshbeekhani/spark-tts-0.5b-finetune-16bit",
	max_seq_length=2048, # Adjust based on your needs
	dtype=torch.bfloat16, # Use bfloat16 for LoRA compatibility and efficiency
	full_finetuning=False, # Set to False if you want to use the model for inference only
	)

	# Example text input for speech synthesis
	text = "Hello, welcome to the Spark-TTS fine-tuned model demo!"

	# Tokenize the input text
	inputs = tokenizer(text, return_tensors="pt")

	# Generate speech output from the model
	# Note: Adjust this to your model’s specific generate method if applicable
	outputs = model.generate(**inputs)

	# Process or save outputs as needed (e.g., convert to audio waveform)
	# This part depends on your model’s output format and synthesis pipeline

	print("Inference completed successfully.")

	# Limitations
	LoRA fine-tuning is supported only with bfloat16 precision.

	Designed primarily for speech synthesis; may not perform well for unrelated NLP tasks.

	Usage in production should be tested carefully for latency and quality trade-offs.

	#License

	This model is licensed under the MIT License.
	If you want, I can help generate a README.md file or add badges and additional sections!