We have quantised the model in 8-bit to make it inferenceable in low-end GPU cards at scale. It was achieved thanks to llama.cpp library.

GGUF

Model size

2B params

Architecture

qwen2

Hardware compatibility

8-bit

Model tree for sleeping-ai/Qwen2-Math-1.5B-GGUF

Base model

Quantized

(7)

this model