Musician-Llama

Fine-tuned Llama 3.2 1B Instruct for Text-to-MIDI Music Generation

A specialized music AI assistant that generates MIDI token sequences from natural language music descriptions.

Overview

Musician-Llama is a fine-tuned version of Llama 3.2 1B Instruct optimized for converting text descriptions of music into MIDI token sequences. It understands musical concepts, genres, instruments, and styles, enabling creative music generation from simple descriptions.

Input: Natural language music description
Output: MIDI token sequence (pipe-separated format)

Model Details

Base Model: meta-llama/Llama-3.2-1B-Instruct
Fine-tuning Method: Supervised Fine-Tuning (SFT)
Task: Text-to-MIDI token generation
Framework: Transformers + TRL
Quantization: 4-bit (NF4 with double quantization)

Quick Start

1. Installation

pip install transformers torch miditok

2. Load Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ghanibhuti/Musician-Llama-3.2-1B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Ghanibhuti/Musician-Llama-3.2-1B-Instruct")

3. Generate MIDI Tokens

from transformers import pipeline

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
)

# Your music description
USER_INPUT = "Upbeat electronic dance music with strong bass and drum patterns"

# Create chat messages
messages = [
    {"role": "system", "content": "You are a helpful music AI assistant specialized in MIDI token generation."},
    {"role": "user", "content": USER_INPUT},
]

# Generate MIDI tokens
outputs = pipe(
    messages,
    max_new_tokens=4096,
    temperature=0.7,
    top_p=0.8,
    top_k=21,
    repetition_penalty=1.2,
    do_sample=True,
)

# Extract response
assistant_response = outputs[0]["generated_text"][-1]["content"]
tokens = assistant_response.replace(". ", "|").replace(" ", ",").replace(".", "|")

print("Generated MIDI tokens:")
print(tokens)

4. Convert Tokens to MIDI (Optional)

from miditok import Octuple, TokenizerConfig
from pathlib import Path

# Initialize MIDI tokenizer
config = TokenizerConfig(
    pitch_range=(21, 109),
    num_velocities=32,
    special_tokens=["PAD", "BOS", "EOS", "MASK"],
    use_tempos=True,
    use_time_signatures=True,
    use_programs=True,
    num_tempos=32,
    tempo_range=(40, 250),
)
miditokenizer = Octuple(config)

# Parse and decode tokens
token_list = [
    [int(x) for x in token_tuple_str.split(",")]
    for token_tuple_str in tokens.split("|")
    if token_tuple_str.strip()
]

# Convert to MIDI
midi_obj = miditokenizer.decode(token_list)
midi_obj.dump_midi("output.mid")

print("✅ MIDI file saved to output.mid")

Complete Example

See the main.ipynb notebook for a complete implementation including:

Model loading and configuration
MIDI tokenizer initialization
Batch generation with different temperatures
Automatic MIDI file conversion
Error handling and retry logic

Generation Parameters

Parameter	Default	Description
`max_new_tokens`	4096	Maximum MIDI tokens to generate
`temperature`	0.7	Creativity level (0.1-1.0)
`top_p`	0.8	Diversity sampling
`top_k`	21	Top-K sampling
`repetition_penalty`	1.2	Penalize repetition

Temperature Guide

0.1-0.3: Deterministic, consistent style (good for reproducibility)
0.4-0.6: Balanced creativity and consistency
0.7-0.85: Creative, more variation
0.9+: Very random, may lose coherence

Supported Music Styles

The model can generate music in various styles:

Genres: Jazz, Electronic, Classical, Hip-Hop, Rock, Pop, Ambient, Folk
Tempos: Slow (40-80 BPM) to Fast (180-250 BPM)
Instruments: Piano, Strings, Brass, Woodwinds, Synth, Drums, etc.
Moods: Happy, Sad, Energetic, Calm, Mysterious, etc.

Training Details

Dataset: Custom MIDI-caption paired dataset
Epochs: 2
Batch Size: 8
Learning Rate: 5e-4
Max Sequence Length: 4096 tokens
Quantization: 4-bit NF4 with double quantization
Optimization: LoRA-style fine-tuning

Output Format

MIDI tokens are represented as pipe-separated tuples:

384,0,60,100|386,0,60,0|388,0,62,90|...

Each token contains:

Time delta: Timing information
Channel: MIDI channel
Pitch: Note pitch (0-127)
Velocity: Note velocity (0-127)

Limitations

Token Format: Output requires post-processing to convert to playable MIDI
Sequence Length: Limited to 4096 tokens maximum
Description Clarity: Quality depends on input description specificity
Note Accuracy: May occasionally generate unusual pitch combinations
Tempo Constraints: Respects configured tempo range (40-250 BPM)

Troubleshooting

Model Not Found

# Make sure you're using the correct repo ID
model = AutoModelForCausalLM.from_pretrained("username/Musician-Llama-3.2-1B-Instruct")

Out of Memory

# Use device_map for automatic GPU memory management
model = AutoModelForCausalLM.from_pretrained(
    "Ghanibhuti/Musician-Llama-3.2-1B-Instruct",
    device_map="auto",
    torch_dtype="auto",
)

Invalid MIDI Tokens

# Filter out empty tokens before decoding
tokens = [t for t in token_list if t and len(t) == 8]

Model Characteristics

✅ Strengths:

Understands natural language music descriptions
Generates coherent, structured MIDI sequences
Supports diverse musical styles and tempos
Fast inference (optimized with 4-bit quantization)
Good balance between creativity and consistency

⚠️ Considerations:

Best with detailed, descriptive prompts
May benefit from temperature tuning for different use cases
Token sequences require MIDI decoding for playback

License

Apache License 2.0

Citation

@model{musician_llama_2024,
  title={Musician-Llama: Text-to-MIDI Music Generation},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/Ghanibhuti/Musician-Llama-3.2-1B-Instruct}
}

Related Resources

Support

For issues, questions, or suggestions, please visit the model repository.

Made with ❤️ for music generation enthusiasts

Downloads last month: 12

Safetensors

Model size

1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ghanibhuti/Musician-Llama-3.2-1B-Instruct

Quantizations

1 model