Musician-Llama

Fine-tuned Llama 3.2 1B Instruct for Text-to-MIDI Music Generation

A specialized music AI assistant that generates MIDI token sequences from natural language music descriptions.

Overview

Musician-Llama is a fine-tuned version of Llama 3.2 1B Instruct optimized for converting text descriptions of music into MIDI token sequences. It understands musical concepts, genres, instruments, and styles, enabling creative music generation from simple descriptions.

Input: Natural language music description
Output: MIDI token sequence (pipe-separated format)

Model Details

  • Base Model: meta-llama/Llama-3.2-1B-Instruct
  • Fine-tuning Method: Supervised Fine-Tuning (SFT)
  • Task: Text-to-MIDI token generation
  • Framework: Transformers + TRL
  • Quantization: 4-bit (NF4 with double quantization)

Quick Start

1. Installation

pip install transformers torch miditok

2. Load Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ghanibhuti/Musician-Llama-3.2-1B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Ghanibhuti/Musician-Llama-3.2-1B-Instruct")

3. Generate MIDI Tokens

from transformers import pipeline

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
)

# Your music description
USER_INPUT = "Upbeat electronic dance music with strong bass and drum patterns"

# Create chat messages
messages = [
    {"role": "system", "content": "You are a helpful music AI assistant specialized in MIDI token generation."},
    {"role": "user", "content": USER_INPUT},
]

# Generate MIDI tokens
outputs = pipe(
    messages,
    max_new_tokens=4096,
    temperature=0.7,
    top_p=0.8,
    top_k=21,
    repetition_penalty=1.2,
    do_sample=True,
)

# Extract response
assistant_response = outputs[0]["generated_text"][-1]["content"]
tokens = assistant_response.replace(". ", "|").replace(" ", ",").replace(".", "|")

print("Generated MIDI tokens:")
print(tokens)

4. Convert Tokens to MIDI (Optional)

from miditok import Octuple, TokenizerConfig
from pathlib import Path

# Initialize MIDI tokenizer
config = TokenizerConfig(
    pitch_range=(21, 109),
    num_velocities=32,
    special_tokens=["PAD", "BOS", "EOS", "MASK"],
    use_tempos=True,
    use_time_signatures=True,
    use_programs=True,
    num_tempos=32,
    tempo_range=(40, 250),
)
miditokenizer = Octuple(config)

# Parse and decode tokens
token_list = [
    [int(x) for x in token_tuple_str.split(",")]
    for token_tuple_str in tokens.split("|")
    if token_tuple_str.strip()
]

# Convert to MIDI
midi_obj = miditokenizer.decode(token_list)
midi_obj.dump_midi("output.mid")

print("✅ MIDI file saved to output.mid")

Complete Example

See the main.ipynb notebook for a complete implementation including:

  • Model loading and configuration
  • MIDI tokenizer initialization
  • Batch generation with different temperatures
  • Automatic MIDI file conversion
  • Error handling and retry logic

Generation Parameters

Parameter Default Description
max_new_tokens 4096 Maximum MIDI tokens to generate
temperature 0.7 Creativity level (0.1-1.0)
top_p 0.8 Diversity sampling
top_k 21 Top-K sampling
repetition_penalty 1.2 Penalize repetition

Temperature Guide

  • 0.1-0.3: Deterministic, consistent style (good for reproducibility)
  • 0.4-0.6: Balanced creativity and consistency
  • 0.7-0.85: Creative, more variation
  • 0.9+: Very random, may lose coherence

Supported Music Styles

The model can generate music in various styles:

  • Genres: Jazz, Electronic, Classical, Hip-Hop, Rock, Pop, Ambient, Folk
  • Tempos: Slow (40-80 BPM) to Fast (180-250 BPM)
  • Instruments: Piano, Strings, Brass, Woodwinds, Synth, Drums, etc.
  • Moods: Happy, Sad, Energetic, Calm, Mysterious, etc.

Training Details

  • Dataset: Custom MIDI-caption paired dataset
  • Epochs: 2
  • Batch Size: 8
  • Learning Rate: 5e-4
  • Max Sequence Length: 4096 tokens
  • Quantization: 4-bit NF4 with double quantization
  • Optimization: LoRA-style fine-tuning

Output Format

MIDI tokens are represented as pipe-separated tuples:

384,0,60,100|386,0,60,0|388,0,62,90|...

Each token contains:

  • Time delta: Timing information
  • Channel: MIDI channel
  • Pitch: Note pitch (0-127)
  • Velocity: Note velocity (0-127)

Limitations

  1. Token Format: Output requires post-processing to convert to playable MIDI
  2. Sequence Length: Limited to 4096 tokens maximum
  3. Description Clarity: Quality depends on input description specificity
  4. Note Accuracy: May occasionally generate unusual pitch combinations
  5. Tempo Constraints: Respects configured tempo range (40-250 BPM)

Troubleshooting

Model Not Found

# Make sure you're using the correct repo ID
model = AutoModelForCausalLM.from_pretrained("username/Musician-Llama-3.2-1B-Instruct")

Out of Memory

# Use device_map for automatic GPU memory management
model = AutoModelForCausalLM.from_pretrained(
    "Ghanibhuti/Musician-Llama-3.2-1B-Instruct",
    device_map="auto",
    torch_dtype="auto",
)

Invalid MIDI Tokens

# Filter out empty tokens before decoding
tokens = [t for t in token_list if t and len(t) == 8]

Model Characteristics

Strengths:

  • Understands natural language music descriptions
  • Generates coherent, structured MIDI sequences
  • Supports diverse musical styles and tempos
  • Fast inference (optimized with 4-bit quantization)
  • Good balance between creativity and consistency

⚠️ Considerations:

  • Best with detailed, descriptive prompts
  • May benefit from temperature tuning for different use cases
  • Token sequences require MIDI decoding for playback

License

Apache License 2.0

Citation

@model{musician_llama_2024,
  title={Musician-Llama: Text-to-MIDI Music Generation},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/Ghanibhuti/Musician-Llama-3.2-1B-Instruct}
}

Related Resources

Support

For issues, questions, or suggestions, please visit the model repository.


Made with ❤️ for music generation enthusiasts

Downloads last month
12
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ghanibhuti/Musician-Llama-3.2-1B-Instruct

Quantizations
1 model