--- tags: - text-generation - transformers - peft - qlora - bitsandbytes - mistral - mistral-7b - fine-tune license: apache-2.0 --- # my-qlora-mistral7b-instruct This is a **QLoRA fine-tuned** version of the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model. It was fine-tuned using **Low-Rank Adaptation (LoRA)** in 4-bit precision for efficiency on consumer GPUs. ## 🚀 Model Details - **Base model**: mistralai/Mistral-7B-Instruct-v0.2 - **Fine-tuning method**: QLoRA with PEFT - **Quantization**: 4-bit (bitsandbytes) - **Task**: Instruction following / conversational AI - **Dataset**: Custom instruction-response pairs - **Training environment**: Google Colab Pro (T4 / A100 GPU) ## 📦 How to Use ```python # First, make sure you have the necessary libraries installed: # pip install transformers peft bitsandbytes accelerate import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig from peft import PeftModel from accelerate import infer_auto_device_map, dispatch_model fine_tuned_model_id = "Falah/my-qlora-mistral7b-instruct" base_model_id = "mistralai/Mistral-7B-Instruct-v0.2" print("Loading tokenizer...") tokenizer = AutoTokenizer.from_pretrained(fine_tuned_model_id) print("Loading base model with quantization...") bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16 ) base_model = AutoModelForCausalLM.from_pretrained( base_model_id, quantization_config=bnb_config, device_map=None, # Load to CPU initially torch_dtype=torch.float16, trust_remote_code=True, ) print("Loading PEFT adapter onto the base model...") model = PeftModel.from_pretrained(base_model, fine_tuned_model_id) print("Dispatching model to devices...") device_map = infer_auto_device_map(model, dtype=torch.float16) model = dispatch_model(model, device_map=device_map) # Ensure the model is in evaluation mode model.eval() print("Creating text generation pipeline...") generator = pipeline( "text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.float16, device_map="auto", ) # Define a sample user prompt user_prompt = "Write a short story about a robot learning to love." # Format the prompt formatted_prompt = f"[INST] {user_prompt} [/INST]" # Generate text outputs = generator( formatted_prompt, max_new_tokens=200, num_return_sequences=1, do_sample=True, temperature=0.7, top_k=50, top_p=0.95, ) # Print the generated text for i, output in enumerate(outputs): print(f"Generated Output {i+1}:\n{output['generated_text']}") ```