Qwen2.5-1.5B-Executorch-Q8DA4W
This repository contains the qwen2_5_1_5b_q8da4w.pte model, exported for use with ExecuTorch.
Details
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Format:
.pte(ExecuTorch) - Quantization: Q8DA4W (4-bit linear weights, 8-bit dynamic activations)
- Architecture: Qwen2
- File Size: ~1.6 GB
Features
- 🚀 Optimized for mobile/edge devices
- 📱 Compatible with
react-native-executorch - 🌍 Excellent multilingual support (including Vietnamese!)
- 💬 Strong instruction-following capabilities
- 🧠 Alibaba's Qwen 2.5 is known for exceptional reasoning
Usage
This model is ready to be used in mobile applications (iOS/Android) via the ExecuTorch runtime or react-native-executorch.
- Download
qwen2_5_1_5b_q8da4w.pteand the tokenizer files (tokenizer.json,vocab.json,merges.txt). - Place them in your app's asset folder.
- Load with ExecuTorch runtime.
Notes
- Qwen2 uses byte-level BPE tokenizer (similar to GPT-2), not SentencePiece.
- Tokenizer files are:
tokenizer.json,vocab.json,merges.txt - Vocab size: 151,936 tokens
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support