QwQ-32B-Preview-bnb-4bit

Introduction

QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.

Model Details

  • Quantization: 4-bit using Bits and Bytes (bnb)
  • Base Model: Qwen/QwQ-32B-Preview
  • Parameters: 32.5 billion
  • Context Length: Up to 32,768 tokens
Downloads last month
11
Safetensors
Model size
34B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for kurcontko/QwQ-32B-Preview-bnb-4bit

Base model

Qwen/Qwen2.5-32B
Quantized
(115)
this model