--- license: llama3.1 pipeline_tag: text-classification library_name: onnx base_model: meta-llama/Prompt-Guard-86M tags: - onnx - onnxruntime - quantized --- # Prompt-Guard-86M — ONNX (INT8) **Built with Llama.** This repo provides a quantized ONNX Runtime export of `meta-llama/Prompt-Guard-86M`. ## What’s inside - `model.onnx`: INT8 quantized graph (ONNX Runtime dynamic quantization) - `tokenizer.json` / `tokenizer_config.json` - `config.json` - `LICENSE` (Llama 3.1 Community License) and `NOTICE` ## How it was made - Export: 🤗 Optimum ONNX exporter - Quantization: ONNX Runtime (dynamic, per-channel where supported) - Command: `optimum-cli export onnx ...` then `onnxruntime.quantization ...` - Environment: onnxruntime==, optimum== (See Optimum/ONNX docs for details.) ## Usage (Python) ```python import onnxruntime as ort from transformers import AutoTokenizer tok = AutoTokenizer.from_pretrained("/Prompt-Guard-86M-onnx-int8") session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"]) text = "your input" enc = tok(text, return_tensors="np", padding=True, truncation=True) outputs = session.run(None, {k: v for k, v in enc.items()})