🧠 DeepSeek-Qwen-1.5B-Multitask-LoRA
Author: Gilbert Akham
License: Apache-2.0
Base model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Adapter type: LoRA (PEFT)
Capabilities: Multi-task generalization & reasoning
🌟 Overview
This model is a LoRA-tuned variant of DeepSeek-R1-Distill-Qwen-1.5B, trained on a multi-task mixture designed to teach the model to:
- write professional emails
- continue stories coherently
- hold conversations and reason (from SmolTalk)
- summarize long articles (CNN/DailyMail)
- answer technical questions
- generate reports and structured text
It demonstrates strong reasoning, clarity, and context retention for small-scale compute deployment (4-bit quantization compatible).
🧩 Training Details
| Parameter | Value |
|---|---|
| Base model | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| Adapter | LoRA (r=8, alpha=32, dropout=0.1) |
| Max sequence length | 1024 |
| Learning rate | 3e-5 (cosine decay) |
| Optimizer | adamw_8bit |
| Grad Accumulation | 4 |
| Precision | 4-bit quantized, FP16 compute |
| Steps | 12k total (best @ ~8.2k) |
| Training time | ~2.5h on A4000 |
| Frameworks | 🤗 Transformers, PEFT, TRL, BitsAndBytes |
🧠 Reasoning Capability
Thanks to integration of SmolTalk and diverse multi-task prompts, the model learns:
- Chain-of-thought style reasoning
- Conversational grounding
- Multi-step logical inferences
- Instruction following across domains
Example:
### Task: Explain reasoning
### Input:
If a train leaves City A at 3 PM and arrives at City B at 6 PM, covering 180 km, what is its average speed?
### Output:
The train travels 180 km in 3 hours.
Average speed = 180 ÷ 3 = 60 km/h.
Model tree for GilbertAkham/deepseek-R1-multitask-lora
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B