Speculators
					Collection
				
				6 items
				โข 
				Updated
					
				โข
					
					2
Build the fastest OSS vllm-based speculative decoding system for your own model, using ArcticTraining and ArcticInference!
Throughput (tokens/s) of gpt-oss-120b on 8xH100 using vLLM below:
| method | ShareGPT | HumanEval | 
|---|---|---|
| vLLM V1 Baseline | 220.2 | 220.7 | 
| ArcticSpeculator | 377.3 | 400.0 | 
For more details about ArcticSpeculator and how to use it:
See all of the speculators we have released via our Speculators Collection