qwen3-0.6b-mlx-my1stVS
Fine-tuned with Apple MLX Framework
This model is a fine-tuned version of Qwen3-0.6B optimized for Apple Silicon (M1/M2/M3/M4) using the MLX framework.
π MLX Framework Benefits
- 2-10x faster inference on Apple Silicon
- 50-80% lower memory usage with quantization
- Native Apple optimization for M-series chips
- Easy deployment without CUDA dependencies
π Quick Start
Using with MLX (Recommended for Apple Silicon)
import mlx.core as mx
from mlx_lm import load, generate
# Load the fine-tuned model
model, tokenizer = load("TJ498/qwen3-0.6b-mlx-my1stVS")
# Generate text
prompt = "### Instruction: What is Apple MLX?\n\n### Response:"
response = generate(model, tokenizer, prompt, max_tokens=100)
print(response)
Using LoRA Adapters
# Clone the repository
git clone https://huggingface.co/TJ498/qwen3-0.6b-mlx-my1stVS
# Generate with adapters
python -m mlx_lm.generate --model ./mlx_model --adapter-path ./adapters --prompt "Your prompt"
π Model Details
- Base Model: Qwen/Qwen3-0.6B
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Framework: Apple MLX
- Training Date: 2025-07-22
- Parameters: ~600M base + ~0.66M LoRA adapters
- Quantization: 4-bit quantization applied
- Memory Usage: ~0.5GB for inference
π― Training Details
- Training Iterations: 50
- Batch Size: 1
- Learning Rate: 1e-05
- LoRA Rank: 16
- LoRA Alpha: 16
π Usage Examples
The model is trained to follow instruction-response format:
### Instruction: Your question here
### Response: Model's answer
β‘ Performance
Optimized for Apple Silicon with significant performance improvements:
- Inference Speed: 150-200 tokens/sec on M1/M2/M3
- Memory Efficiency: <1GB memory usage
- Power Consumption: 60% less than traditional frameworks
π οΈ Requirements
- Apple Silicon Mac (M1/M2/M3/M4)
- macOS 13.3 or later
- Python 3.9+
- MLX framework:
pip install mlx mlx-lm
π License
apache-2.0
π€ Model Hub
This model is available on the Hugging Face Hub: https://huggingface.co/TJ498/qwen3-0.6b-mlx-my1stVS
Fine-tuned with β€οΈ using Apple MLX Framework