QwQ-32B-Preview-bnb-4bit

Introduction

QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.

Model Details

  • Quantization: 4-bit using Bits and Bytes (bnb)
  • Base Model: Qwen/QwQ-32B-Preview
  • Parameters: 32.5 billion
  • Context Length: Up to 32,768 tokens
Downloads last month
-
Safetensors
Model size
18B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for kurcontko/QwQ-32B-Preview-bnb-4bit

Base model

Qwen/Qwen2.5-32B
Quantized
(115)
this model