Qwen3 AWQ Quantized Model Collection

This repository provides AWQ (Activation-aware Weight Quantization) versions of Qwen3 models, optimized for efficient deployment on consumer hardware while maintaining strong performance.

Models Available

  • Qwen3-32B-AWQ  -  4-bit quantized, 32B parameters
  • Qwen3-14B-AWQ  -  4-bit quantized, 14B parameters
  • Qwen3-8B-AWQ  -  4-bit quantized, 8B parameters
  • Qwen3-4B-AWQ  -  4-bit quantized, 4B parameters

Quantization Details

  • Weights: 4-bit precision (AWQ)
  • Activations: 16-bit precision
  • Benefits:
    • Up to 3x memory reduction vs FP16
    • Up to 3x inference speedup on supported hardware
    • Minimal loss in model quality

Features

  • Multilingual: Supports 100+ languages
  • Long Context: Native 32K context, extendable with YaRN to 131K tokens
  • Efficient Inference: Optimized for NVIDIA GPUs with Tensor Core support

Usage

With Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("abhishekchohan/Qwen3-8B-AWQ", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("abhishekchohan/Qwen3-8B-AWQ")

messages = [{"role": "user", "content": "Explain quantum computing."}]
text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

With vLLM

vllm serve abhishekchohan/Qwen3-8B-AWQ \
    --chat-template templates/chat_template.jinja \
    --enable-expert-parallel \
    --tensor-parallel-size 4

Citation

If you use these models, please cite:

@misc{qwen3,
    title = {Qwen3 Technical Report},
    author = {Qwen Team},
    year = {2025},
    url = {https://github.com/QwenLM/Qwen3}
}
Downloads last month
134
Safetensors
Model size
6B params
Tensor type
I32
·
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abhishekchohan/Qwen3-32B-AWQ

Base model

Qwen/Qwen3-32B
Quantized
(123)
this model

Collection including abhishekchohan/Qwen3-32B-AWQ