Prompt-Guard-86M — ONNX (INT8)

Built with Llama.

This repo provides a quantized ONNX Runtime export of meta-llama/Prompt-Guard-86M.

What’s inside

model.onnx: INT8 quantized graph (ONNX Runtime dynamic quantization)
tokenizer.json / tokenizer_config.json
config.json
LICENSE (Llama 3.1 Community License) and NOTICE

How it was made

Export: 🤗 Optimum ONNX exporter
Quantization: ONNX Runtime (dynamic, per-channel where supported)
Command: optimum-cli export onnx ... then onnxruntime.quantization ...
Environment: onnxruntime==, optimum== (See Optimum/ONNX docs for details.)

Usage (Python)

import onnxruntime as ort
from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("<you>/Prompt-Guard-86M-onnx-int8")
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

text = "your input"
enc = tok(text, return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, {k: v for k, v in enc.items()})

Downloads last month: 111

Model tree for Derbdale/Llama-Prompt-Guard-86M-ONNX

Base model

meta-llama/Prompt-Guard-86M

Quantized

(2)

this model