Prompt-Guard-86M — ONNX (INT8)

Built with Llama.

This repo provides a quantized ONNX Runtime export of meta-llama/Prompt-Guard-86M.

What’s inside

  • model.onnx: INT8 quantized graph (ONNX Runtime dynamic quantization)
  • tokenizer.json / tokenizer_config.json
  • config.json
  • LICENSE (Llama 3.1 Community License) and NOTICE

How it was made

  • Export: 🤗 Optimum ONNX exporter
  • Quantization: ONNX Runtime (dynamic, per-channel where supported)
  • Command: optimum-cli export onnx ... then onnxruntime.quantization ...
  • Environment: onnxruntime==, optimum== (See Optimum/ONNX docs for details.)

Usage (Python)

import onnxruntime as ort
from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("<you>/Prompt-Guard-86M-onnx-int8")
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

text = "your input"
enc = tok(text, return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, {k: v for k, v in enc.items()})
Downloads last month
111
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Derbdale/Llama-Prompt-Guard-86M-ONNX

Quantized
(2)
this model