Update README.md
Browse files
README.md
CHANGED
|
@@ -28,6 +28,12 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
|
|
| 28 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
|
| 29 |
* [ctransformers](https://github.com/marella/ctransformers)
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
## Repositories available
|
| 32 |
|
| 33 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-13B-V1.1-GPTQ)
|
|
|
|
| 28 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
|
| 29 |
* [ctransformers](https://github.com/marella/ctransformers)
|
| 30 |
|
| 31 |
+
## Update 9th July 2023: GGML k-quants now available
|
| 32 |
+
|
| 33 |
+
Thanks to the work of LostRuins/concedo, it is now possible to provide 100% working GGML k-quants for models like this which have a non-standard vocab size (32,001).
|
| 34 |
+
|
| 35 |
+
k-quants have been uploaded and will work with all llama.cpp clients without any changes required.
|
| 36 |
+
|
| 37 |
## Repositories available
|
| 38 |
|
| 39 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-13B-V1.1-GPTQ)
|