--- license: apache-2.0 datasets: - HuggingFaceH4/ultrachat_200k - BAAI/Infinity-Instruct - HuggingFaceH4/ultrafeedback_binarized - Intel/orca_dpo_pairs - argilla/OpenHermesPreferences - BramVanroy/dolly-15k-dutch base_model: - Zyphra/Zamba2-1.2B-instruct library_name: transformers --- # Model Card for Zamba2-1.2B-instruct-Dutch Zamba2-1.2B-instruct-Dutch is a Dutch language instruction-following model obtained through a two-stage fine-tuning process: 1. First stage (Base instruction model by Zyphra): - Zyphra fine-tuned Zamba2-1.2B to create Zamba2-1.2B-instruct through: - SFT training on [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) and [Infinity-Instruct](https://huggingface.co/datasets/BAAI/Infinity-Instruct) - DPO training on [ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized), [orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs), and [OpenHermesPreferences](https://huggingface.co/datasets/argilla/OpenHermesPreferences) 2. Second stage (Dutch language adaptation): - Further fine-tuning of Zyphra's Zamba2-1.2B-instruct on the [dolly-15k-dutch](https://huggingface.co/datasets/BramVanroy/dolly-15k-dutch) dataset, specifically using the training split The model maintains the core hybrid architecture of Zamba2 while being optimized for Dutch language understanding and generation. ## Quick start ### Prerequisites To download Zamba2-1.2B-instruct-Dutch, clone Zyphra's fork of transformers: 1. `git clone https://github.com/Zyphra/transformers_zamba2.git` 2. `cd transformers_zamba2` 3. Install the repository: `pip install -e .` 4. `pip install accelerate` ### Inference ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Instantiate model and tokenizer tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-1.2B-instruct-Dutch") model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-1.2B-instruct-Dutch", device_map="cuda", torch_dtype=torch.bfloat16) # Format the input as a chat template prompt = "Wat zijn de belangrijkste oorzaken van de val van het Romeinse Rijk?" sample = [{'role': 'user', 'content': prompt}] chat_sample = tokenizer.apply_chat_template(sample, tokenize=False) # Tokenize input and generate output input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda") outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False) print((tokenizer.decode(outputs[0]))) ``` ## Training Details The model was fine-tuned using the following approach: 1. Started with the base Zamba2-1.2B-instruct model 2. Fine-tuned on the dolly-15k-dutch dataset using optimized learning rates 3. Implemented memory optimization through gradient checkpointing 4. Utilized mixed precision training (bf16) ### Fine-tuning Configuration The model includes an advanced learning rate optimization system for fine-tuning, implemented through the custom `LROptimizerCallback` class which can be found in _lr_optimizer.py_: ```python from transformers import AutoTokenizer, Trainer from lr_optimizer import setup_training, LROptimizerCallback callback = LROptimizerCallback( num_trials=10, lr_range=(1e-6, 1e-4) ) trainer = Trainer( model=model, args=training_args, callbacks=[callback] ) trainer.train() ``` ## Model Architecture Zamba2-1.2B-instruct-Dutch maintains the hybrid SSM-attention architecture of the base model: - Backbone of Mamba2 layers interleaved with shared attention layers - LoRA projection matrices for shared transformer blocks - Rotary position embeddings in the shared attention layer - Concatenated original embeddings for improved information maintenance