Phi-4-mini-instruct INT4_SYM for Intel NPU
π First NPU-optimized Phi-4-mini model with correct quantization for Intel NPU!
Model Description
This is microsoft/Phi-4-mini-instruct (2.6B parameters) converted to OpenVINO IR format with NPU-specific INT4 symmetric quantization.
Key Difference from Standard OpenVINO Models
Critical Discovery: Intel NPU requires INT4_SYM (symmetric, channel-wise) quantization, not the INT4_ASYM (asymmetric, grouped) used by standard OpenVINO pre-converted models.
| Quantization Type | NPU Compatibility |
|---|---|
| INT4_ASYM (group_size=64) | β FAILS (MatMul errors) |
| INT4_SYM (channel-wise) | β WORKS (this model) |
Quantization Details
- Method: INT4_SYM (symmetric)
- Group size: -1 (channel-wise, not grouped)
- Calibration: AWQ + scale_estimation on wikitext2 dataset
- Distribution: 84% INT4_SYM (128 layers), 16% INT8_ASYM (1 layer)
- Size: 2.13 GB
Performance on Intel NPU
Tested on Intel Core Ultra 7 155H (NPU driver v32.0.100.4297):
- Speed: 6.8 tok/s
- Compilation: 68.5s
- Inference: Stable, production-ready
Comparison to other models on same hardware (Intel Core Ultra 7 155H):
- Qwen2.5-1.5B-Instruct (INT4_SYM): 10.7 tok/s (0.87 GB) - Baseline performance
- Phi-4-mini-instruct (INT4_SYM): 6.8 tok/s (2.13 GB) - 73% more parameters, reasoning capabilities
- Performance ratio: ~64% of Qwen speed, but significantly more capable model
Usage
Requirements
pip install openvino-genai huggingface-hub
Python API
from openvino_genai import LLMPipeline
# Load and run on Intel NPU
pipe = LLMPipeline("AhtnaGlen/phi-4-mini-instruct-int4-sym-npu-ov", device="NPU")
# Generate text
response = pipe.generate("Explain quantum computing:", max_new_tokens=100)
print(response)
Streaming
for token in pipe.generate("Write a story:", max_new_tokens=200, stream=True):
print(token, end='', flush=True)
Why This Matters
Standard OpenVINO Phi-4 models (e.g., OpenVINO/Phi-4-mini-instruct-int4-ov) use INT4_ASYM quantization which fails NPU compilation with errors like:
[ERROR] Channels count of input tensor shape and filter shape must be the same: 0 != 48
This model uses the correct NPU-optimized quantization as specified in Intel's NPU documentation:
optimum-cli export openvino -m microsoft/Phi-4-mini-instruct \
--weight-format int4 \
--sym \ # Symmetric (key for NPU!)
--group-size -1 \ # Channel-wise (not grouped!)
--awq --scale-estimation \
--dataset wikitext2
Model Capabilities
- Instruction following: Fine-tuned for chat/instruction tasks
- Reasoning: Enhanced reasoning capabilities (Phi-4 series)
- Context length: 4096 tokens
- NPU acceleration: Full hardware offload to Intel NPU
Hardware Requirements
- Intel NPU: Core Ultra 7 155H (tested), or other NPU 3720/4000 series
- Driver: v32.0.100.4297 or newer
- OpenVINO: 2025.3.0 or newer
- Memory: ~3 GB for model + inference
Limitations
- NPU only: This model is quantized specifically for Intel NPU
- Speed trade-off: 6.8 tok/s vs Qwen2.5-1.5B @ 10.7 tok/s on Intel Core Ultra 7 155H
- Size vs capability: Larger model (2.13 GB) but enhanced reasoning and instruction-following
- Hardware specific: Performance validated on Intel Core Ultra 7 155H NPU
Citation
If you use this model, please cite:
@misc{phi4-mini-npu-optimized,
title={Phi-4-mini-instruct INT4_SYM for Intel NPU},
author={OpenVINO Community},
year={2025},
howpublished={\url{https://huggingface.co/AhtnaGlen/phi-4-mini-instruct-int4-sym-npu-ov}},
}
Acknowledgments
- Base model: Microsoft Phi-4-mini-instruct
- Framework: Intel OpenVINO
- Quantization: NNCF (Neural Network Compression Framework)
- Discovery: Community finding on NPU quantization requirements
License
MIT (following base model license)
Model Card Contact
For issues or questions about NPU compatibility, please open an issue on the model repository.
Note: This model demonstrates the importance of quantization method selection for hardware-specific optimization. Always verify quantization parameters match target hardware requirements!
- Downloads last month
- 15
Model tree for AhtnaGlen/phi-4-mini-instruct-int4-sym-npu-ov
Base model
microsoft/Phi-4-mini-instruct