Prompt-Guard-86M — ONNX (INT8)
Built with Llama.
This repo provides a quantized ONNX Runtime export of meta-llama/Prompt-Guard-86M.
What’s inside
model.onnx: INT8 quantized graph (ONNX Runtime dynamic quantization)tokenizer.json/tokenizer_config.jsonconfig.jsonLICENSE(Llama 3.1 Community License) andNOTICE
How it was made
- Export: 🤗 Optimum ONNX exporter
- Quantization: ONNX Runtime (dynamic, per-channel where supported)
- Command:
optimum-cli export onnx ...thenonnxruntime.quantization ... - Environment: onnxruntime==, optimum== (See Optimum/ONNX docs for details.)
Usage (Python)
import onnxruntime as ort
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("<you>/Prompt-Guard-86M-onnx-int8")
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
text = "your input"
enc = tok(text, return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, {k: v for k, v in enc.items()})
- Downloads last month
- 111
Model tree for Derbdale/Llama-Prompt-Guard-86M-ONNX
Base model
meta-llama/Prompt-Guard-86M