Qwen3-Coder-30B-A3B-Instruct-f32 / README.md

geoffmunn

Layout fixed

80831b8 verified 2 months ago

preview code

raw

history blame contribute delete

3.44 kB

metadata

license: apache-2.0
tags:
  - gguf
  - qwen
  - qwen3
  - qwen3-coder
  - qwen3-coder-f32
  - qwen3-coder-30B
  - qwen3-coder-30B-f32
  - qwen3-coder-30B-gguf
  - qwen3-coder-30B-gguf-f32
  - llama.cpp
  - quantized
  - text-generation
  - reasoning
  - agent
  - multilingual
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
author: geoffmunn
pipeline_tag: text-generation
language:
  - en
  - zh
  - es
  - fr
  - de
  - ru
  - ar
  - ja
  - ko
  - hi

Qwen3-Coder-30B-A3B-Instruct-f32-GGUF

This is a GGUF-quantized version of the Qwen/Qwen3-Coder-30B-A3B-Instruct language model.

Converted for use with llama.cpp, LM Studio, OpenWebUI, GPT4All, and more.

Why f32?

This model uses FP32 (32-bit floating point) as its base precision. This is unusual for GGUF models because:

FP32 doubles memory usage vs FP16.
Modern LLMs (including Qwen3) are trained in mixed precision and do not benefit from FP32 at inference time.
Only useful for debugging, research, or extreme numerical robustness.

F16 is probably a better choice, but you can use this to compare the difference in outputs (if any).

💡 Key Features of Qwen3-Coder-30B-A3B-Instruct:

Available Quantizations (from f32)

Level	Quality	Speed	Size	Recommendation
Q2_K	Minimal	⚡ Fast	11.3 GB	Only on severely memory-constrained systems.
Q3_K_S	Low-Medium	⚡ Fast	13.3 GB	Minimal viability; avoid unless space-limited.
Q3_K_M	Low-Medium	⚡ Fast	14,7 GB	Acceptable for basic interaction.
Q4_K_S	Practical	⚡ Fast	17.5 GB	Good balance for mobile/embedded platforms.
Q4_K_M	Practical	⚡ Fast	18.6 GB	Best overall choice for most users.
Q5_K_S	Max Reasoning	🐢 Medium	21.1 GB	Slight quality gain; good for testing.
Q5_K_M	Max Reasoning	🐢 Medium	21.7 GB	Best quality available. Recommended.
Q6_K	Near-FP16	🐌 Slow	25.1 GB	Diminishing returns. Only if RAM allows.
Q8_0	Lossless*	🐌 Slow	32.5 GB	Maximum fidelity. Ideal for archival.

💡 Recommendations by Use Case

💻 Standard Laptop (i5/M1 Mac): Q5_K_M (optimal quality)
🧠 Reasoning, Coding, Math: Q5_K_M or Q6_K
🔍 RAG, Retrieval, Precision Tasks: Q6_K or Q8_0
🤖 Agent & Tool Integration: Q5_K_M
🛠️ Development & Testing: Test from Q4_K_M up to Q8_0

Usage

Load this model using:

OpenWebUI – self-hosted AI interface with RAG & tools
LM Studio – desktop app with GPU support
GPT4All – private, offline AI chatbot
Or directly via llama.cpp

Each quantized model includes its own README.md and shares a common MODELFILE.

Author

👤 Geoff Munn (@geoffmunn)
🔗 Hugging Face Profile

Disclaimer

This is a community conversion for local inference. Not affiliated with Alibaba Cloud or the Qwen team.