ArtusDev/cerebras_GLM-4.6-REAP-218B-A32B-EXL3
EXL3 quants of cerebras/GLM-4.6-REAP-218B-A32B using exllamav3 for quantization.
Quants
| Quant | BPW | Head Bits | Size (GB) |
|---|---|---|---|
| 2.5_H6 | 2.5 | 6 | 70.36 |
| 2.97_H6 (optimized) | 2.97 | 6 | 82.81 |
| 3.0_H6 | 3.0 | 6 | 83.95 |
| 3.43_H6 (optimized) | 3.43 | 6 | 95.29 |
| 3.5_H6 | 3.5 | 6 | 97.46 |
| 3.91_H6 (optimized) | 3.91 | 6 | 108.08 |
| 4.0_H6 | 4.0 | 6 | 111.05 |
| 4.25_H6 | 4.25 | 6 | 117.76 |
| 5.0_H6 | 5.0 | 6 | 138.14 |
| 6.0_H6 | 6.0 | 6 | 165.24 |
| 8.0_H8 | 8.0 | 8 | 219.63 |
How to Download and Use Quants
You can download quants by targeting specific size using the Hugging Face CLI.
Click for download commands
1. Install huggingface-cli:
pip install -U "huggingface_hub[cli]"
2. Download a specific quant:
huggingface-cli download ArtusDev/cerebras_GLM-4.6-REAP-218B-A32B-EXL3 --revision "5.0bpw_H6" --local-dir ./
EXL3 quants can be run with any inference client that supports EXL3, such as TabbyAPI. Refer to documentation for set up instructions.
Acknowledgements
Made possible with cloud compute from lium.io