SmolLM2-1.7B-Executorch-Q8DA4W

This repository contains the smollm2_1_7b_q8da4w.pte model, exported for use with ExecuTorch.

Details

  • Base Model: HuggingFaceTB/SmolLM2-1.7B-Instruct
  • Format: .pte (ExecuTorch)
  • Quantization: Q8DA4W (4-bit linear weights, 8-bit dynamic activations)
  • Architecture: llama (compatible with Llama export pipeline)
  • File Size: ~1.7 GB

Features

  • 🚀 Optimized for mobile/edge devices
  • 📱 Compatible with react-native-executorch
  • 💡 SmolLM2 is efficient and fast for resource-constrained environments
  • 🗣️ Instruct-tuned for conversational AI

Usage

This model is ready to be used in mobile applications (iOS/Android) via the ExecuTorch runtime or react-native-executorch.

  1. Download smollm2_1_7b_q8da4w.pte and the tokenizer files (tokenizer.json, vocab.json, merges.txt).
  2. Place them in your app's asset folder.
  3. Load with ExecuTorch runtime.

Notes

  • SmolLM2 uses byte-level BPE tokenizer (similar to GPT-2), not SentencePiece like Llama.
  • Tokenizer files are: tokenizer.json, vocab.json, merges.txt
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blackcloud1199/SmolLM2-1.7B-Executorch-Q8DA4W

Finetuned
(109)
this model