Musician-Llama
Fine-tuned Llama 3.2 1B Instruct for Text-to-MIDI Music Generation
A specialized music AI assistant that generates MIDI token sequences from natural language music descriptions.
Overview
Musician-Llama is a fine-tuned version of Llama 3.2 1B Instruct optimized for converting text descriptions of music into MIDI token sequences. It understands musical concepts, genres, instruments, and styles, enabling creative music generation from simple descriptions.
Input: Natural language music description
Output: MIDI token sequence (pipe-separated format)
Model Details
- Base Model: meta-llama/Llama-3.2-1B-Instruct
- Fine-tuning Method: Supervised Fine-Tuning (SFT)
- Task: Text-to-MIDI token generation
- Framework: Transformers + TRL
- Quantization: 4-bit (NF4 with double quantization)
Quick Start
1. Installation
pip install transformers torch miditok
2. Load Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Ghanibhuti/Musician-Llama-3.2-1B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Ghanibhuti/Musician-Llama-3.2-1B-Instruct")
3. Generate MIDI Tokens
from transformers import pipeline
# Create text generation pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device_map="auto",
)
# Your music description
USER_INPUT = "Upbeat electronic dance music with strong bass and drum patterns"
# Create chat messages
messages = [
{"role": "system", "content": "You are a helpful music AI assistant specialized in MIDI token generation."},
{"role": "user", "content": USER_INPUT},
]
# Generate MIDI tokens
outputs = pipe(
messages,
max_new_tokens=4096,
temperature=0.7,
top_p=0.8,
top_k=21,
repetition_penalty=1.2,
do_sample=True,
)
# Extract response
assistant_response = outputs[0]["generated_text"][-1]["content"]
tokens = assistant_response.replace(". ", "|").replace(" ", ",").replace(".", "|")
print("Generated MIDI tokens:")
print(tokens)
4. Convert Tokens to MIDI (Optional)
from miditok import Octuple, TokenizerConfig
from pathlib import Path
# Initialize MIDI tokenizer
config = TokenizerConfig(
pitch_range=(21, 109),
num_velocities=32,
special_tokens=["PAD", "BOS", "EOS", "MASK"],
use_tempos=True,
use_time_signatures=True,
use_programs=True,
num_tempos=32,
tempo_range=(40, 250),
)
miditokenizer = Octuple(config)
# Parse and decode tokens
token_list = [
[int(x) for x in token_tuple_str.split(",")]
for token_tuple_str in tokens.split("|")
if token_tuple_str.strip()
]
# Convert to MIDI
midi_obj = miditokenizer.decode(token_list)
midi_obj.dump_midi("output.mid")
print("✅ MIDI file saved to output.mid")
Complete Example
See the main.ipynb notebook for a complete implementation including:
- Model loading and configuration
- MIDI tokenizer initialization
- Batch generation with different temperatures
- Automatic MIDI file conversion
- Error handling and retry logic
Generation Parameters
| Parameter | Default | Description |
|---|---|---|
max_new_tokens |
4096 | Maximum MIDI tokens to generate |
temperature |
0.7 | Creativity level (0.1-1.0) |
top_p |
0.8 | Diversity sampling |
top_k |
21 | Top-K sampling |
repetition_penalty |
1.2 | Penalize repetition |
Temperature Guide
- 0.1-0.3: Deterministic, consistent style (good for reproducibility)
- 0.4-0.6: Balanced creativity and consistency
- 0.7-0.85: Creative, more variation
- 0.9+: Very random, may lose coherence
Supported Music Styles
The model can generate music in various styles:
- Genres: Jazz, Electronic, Classical, Hip-Hop, Rock, Pop, Ambient, Folk
- Tempos: Slow (40-80 BPM) to Fast (180-250 BPM)
- Instruments: Piano, Strings, Brass, Woodwinds, Synth, Drums, etc.
- Moods: Happy, Sad, Energetic, Calm, Mysterious, etc.
Training Details
- Dataset: Custom MIDI-caption paired dataset
- Epochs: 2
- Batch Size: 8
- Learning Rate: 5e-4
- Max Sequence Length: 4096 tokens
- Quantization: 4-bit NF4 with double quantization
- Optimization: LoRA-style fine-tuning
Output Format
MIDI tokens are represented as pipe-separated tuples:
384,0,60,100|386,0,60,0|388,0,62,90|...
Each token contains:
- Time delta: Timing information
- Channel: MIDI channel
- Pitch: Note pitch (0-127)
- Velocity: Note velocity (0-127)
Limitations
- Token Format: Output requires post-processing to convert to playable MIDI
- Sequence Length: Limited to 4096 tokens maximum
- Description Clarity: Quality depends on input description specificity
- Note Accuracy: May occasionally generate unusual pitch combinations
- Tempo Constraints: Respects configured tempo range (40-250 BPM)
Troubleshooting
Model Not Found
# Make sure you're using the correct repo ID
model = AutoModelForCausalLM.from_pretrained("username/Musician-Llama-3.2-1B-Instruct")
Out of Memory
# Use device_map for automatic GPU memory management
model = AutoModelForCausalLM.from_pretrained(
"Ghanibhuti/Musician-Llama-3.2-1B-Instruct",
device_map="auto",
torch_dtype="auto",
)
Invalid MIDI Tokens
# Filter out empty tokens before decoding
tokens = [t for t in token_list if t and len(t) == 8]
Model Characteristics
✅ Strengths:
- Understands natural language music descriptions
- Generates coherent, structured MIDI sequences
- Supports diverse musical styles and tempos
- Fast inference (optimized with 4-bit quantization)
- Good balance between creativity and consistency
⚠️ Considerations:
- Best with detailed, descriptive prompts
- May benefit from temperature tuning for different use cases
- Token sequences require MIDI decoding for playback
License
Apache License 2.0
Citation
@model{musician_llama_2024,
title={Musician-Llama: Text-to-MIDI Music Generation},
author={Your Name},
year={2024},
url={https://huggingface.co/Ghanibhuti/Musician-Llama-3.2-1B-Instruct}
}
Related Resources
Support
For issues, questions, or suggestions, please visit the model repository.
Made with ❤️ for music generation enthusiasts
- Downloads last month
- 12