SmolLM2-1.7B-Executorch-Q8DA4W
This repository contains the smollm2_1_7b_q8da4w.pte model, exported for use with ExecuTorch.
Details
- Base Model: HuggingFaceTB/SmolLM2-1.7B-Instruct
- Format:
.pte(ExecuTorch) - Quantization: Q8DA4W (4-bit linear weights, 8-bit dynamic activations)
- Architecture: llama (compatible with Llama export pipeline)
- File Size: ~1.7 GB
Features
- 🚀 Optimized for mobile/edge devices
- 📱 Compatible with
react-native-executorch - 💡 SmolLM2 is efficient and fast for resource-constrained environments
- 🗣️ Instruct-tuned for conversational AI
Usage
This model is ready to be used in mobile applications (iOS/Android) via the ExecuTorch runtime or react-native-executorch.
- Download
smollm2_1_7b_q8da4w.pteand the tokenizer files (tokenizer.json,vocab.json,merges.txt). - Place them in your app's asset folder.
- Load with ExecuTorch runtime.
Notes
- SmolLM2 uses byte-level BPE tokenizer (similar to GPT-2), not SentencePiece like Llama.
- Tokenizer files are:
tokenizer.json,vocab.json,merges.txt
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for blackcloud1199/SmolLM2-1.7B-Executorch-Q8DA4W
Base model
HuggingFaceTB/SmolLM2-1.7B
Quantized
HuggingFaceTB/SmolLM2-1.7B-Instruct