dystrio/Mistral-7B-Instruct-v0.3-sculpt-throughput

23% smaller, +20% faster prefill, drop-in replacement. No custom kernels. No runtime changes.

Dystrio Sculpt structurally compresses transformer models, producing dense models that load with standard transformers — no custom code, no new ops, no deployment friction.

This is the Throughput tier of Mistral 7B Instruct v0.3.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dystrio/Mistral-7B-Instruct-v0.3-sculpt-throughput", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("dystrio/Mistral-7B-Instruct-v0.3-sculpt-throughput")

inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Benchmark Results

All tiers compiled from Mistral 7B Instruct v0.3 on A100 80GB, bf16:

Model PPL PPL Ratio Weights (GB) Chat Prefill TPS RAG TTFT p95 (ms) Decode TPS
Baseline 12.5983 1.0 13.500496 10557.3 133.325 66.8
sculpt-default 11.6283 0.923 12.000496 11594.3 123.069 65.3
sculpt-production 14.2859 1.134 11.250496 12093.9 120.842 66.0
sculpt-throughput 16.3355 1.2966 10.406746 12667.0 112.683 65.8
sculpt-experimental 25.1515 1.9964 9.562996 13595.9 110.293 66.5

Key Metrics (this model)

Metric Value
Weights memory 10.406746 GB (23% smaller)
PPL ratio 1.2966
Chat prefill TPS 12667.0 (+20%)
RAG TTFT p95 112.683 ms (-15%)
Decode TPS 65.8 (flat)
Parameters 5.59B

All Sculpt Tiers

Tier HuggingFace Size PPL Ratio Use Case
default dystrio/Mistral-7B-Instruct-v0.3-sculpt-default 12.000496 GB 0.923 Zero-regret: quality preserved, smaller footprint
production dystrio/Mistral-7B-Instruct-v0.3-sculpt-production 11.250496 GB 1.134 Practical savings with modest quality tradeoff
throughput dystrio/Mistral-7B-Instruct-v0.3-sculpt-throughput 👈 this model 10.406746 GB 1.2966 Maximum usable compression for speed/edge
experimental dystrio/Mistral-7B-Instruct-v0.3-sculpt-experimental 9.562996 GB 1.9964 Boundary exploration, maximum structural compression

What is Dystrio Sculpt?

Dystrio Sculpt compiles transformer models into smaller, faster variants. Output models:

  • Are dense (not sparse) — standard architecture, fewer parameters
  • Load with standard HuggingFace Transformers — no custom code needed
  • Require no custom kernels and no runtime changes
  • Work as a one-step compile before deployment
  • Stack with quantization (AWQ, GPTQ, GGUF) for compound savings

Compatibility

  • ✅ HuggingFace Transformers
  • ✅ vLLM
  • ✅ TGI (Text Generation Inference)
  • ✅ llama.cpp / GGUF conversion
  • ✅ AWQ / GPTQ quantization
  • ✅ Any framework that loads standard safetensors

Benchmark Environment

  • GPU: NVIDIA A100-SXM4-80GB
  • dtype: bf16
  • Torch: 2.10.0+cu128
  • Transformers: 5.3.0
  • Deterministic: True
  • Single-GPU, standard HuggingFace Transformers, no custom kernels.

Metric Definitions

  • PPL ratio: WikiText-103 perplexity relative to baseline. <1.0 = quality improved.
  • Prefill TPS: Tokens per second during prompt encoding (higher = faster).
  • TTFT p95: Time to first token at 95th percentile (lower = faster).
  • Decode TPS: Tokens per second during generation (higher = faster).
  • Weights (GB): Model parameter memory (deterministic, runtime-independent).

Citation

@misc{dystrio_sculpt_2026,
  title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
  author={Dystrio},
  year={2026},
  url={https://huggingface.co/dystrio}
}

Downstream Benchmarks (lm-eval)

Evaluated with lm-eval-harness on A100-80GB, bf16, zero-shot.

Benchmark Baseline This Model Delta
ARC-Challenge 0.5794 0.3797 -0.1997
HellaSwag 0.6573 0.5075 -0.1498
MMLU 0.5975 0.3982 -0.1993
TruthfulQA MC2 0.5939 0.4860 -0.1079
Downloads last month
332
Safetensors
Model size
6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dystrio/Mistral-7B-Instruct-v0.3-sculpt-throughput

Finetuned
(399)
this model

Dataset used to train dystrio/Mistral-7B-Instruct-v0.3-sculpt-throughput

Evaluation results