IQuest-Coder-V1-40B-Loop-Instruct-NVFP4
NVFP4-AWQ quantized version of IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct for efficient inference on NVIDIA Blackwell and Grace Hopper GPUs.
Key Details
| Property | Value |
|---|---|
| Base Model | IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct |
| Quantization | NVFP4 with AWQ (Activation-aware Weight Quantization) |
| Format | Safetensors |
| Size | 23.5 GB (68% reduction from ~75GB) |
| Recommended Hardware | NVIDIA DGX Spark, Grace Hopper, Blackwell GPUs |
Quantization Details
This model was quantized using NVIDIA's modelopt library with NVFP4_AWQ_FULL_CFG:
- Phase 1: Activation statistics caching (~8 min)
- Phase 2: AWQ parameter search (~2 hours)
- Phase 3: Clip estimation (~2 hours)
AWQ calibration preserves accuracy by identifying and protecting "salient weights" based on activation magnitudes.
Usage with vLLM
vllm serve Elias-Schwegler/IQuest-Coder-V1-40B-Loop-Instruct-NVFP4 \
--quantization modelopt \
--trust-remote-code \
--port 8000
Usage with SGLang (DGX Spark)
docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
lmsysorg/sglang:spark \
--model Elias-Schwegler/IQuest-Coder-V1-40B-Loop-Instruct-NVFP4 \
--quantization modelopt_fp4 \
--trust-remote-code
Sampling Parameters
From the base model recommendations:
- Temperature: 0.6
- Top-P: 0.95
- Top-K: 20
- Min-P: 0.0
- Max Tokens: 8192
License
Apache 2.0 (same as base model)
Acknowledgments
- Original model by IQuestLab
- Quantization performed using NVIDIA Model Optimizer
- Downloads last month
- 369
Model tree for Elias-Schwegler/IQuest-Coder-V1-40B-Loop-Instruct-NVFP4
Base model
IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct