NOTICE
No longer available on HF due to storage restrictions - archived here
Information
See K-EXAONE-236B-A23B MLX in action - demonstration video
q6.5bit mixed quant typically achieves 1.128 perplexity in our testing
| Quantization | Perplexity |
|---|---|
| q2.5 | 41.293 |
| q3.5 | 1.900 |
| q4.5 | 1.168 |
| q4.8 | 1.140 |
| q5.5 | 1.141 |
| q6.5 | 1.128 |
| q8.5 | 1.128 |
Usage Notes
Tested on a M3 Ultra using Inferencer app v1.9.1
- Single inference ~26.7 tokens/s @ 1000 tokens
- Batched inference ~34 total tokens/s across two inferences
- Memory usage: ~180 GB
Quantized with a modified version of MLX 0.30
For more details see demonstration video or visit K-EXAONE-236B-A23B.
- Downloads last month
- 73
Model tree for inferencerlabs/K-EXAONE-236B-A23B-MLX-6.5bit
Base model
LGAI-EXAONE/K-EXAONE-236B-A23B