ReallyFloppyPenguin
/

MiniCPM4-8B-GGUF

+---
+language:
+- en
+library_name: gguf
+base_model: openbmb/MiniCPM4-8B
+tags:
+- gguf
+- quantized
+- llama.cpp
+license: apache-2.0
+---
+# openbmb/MiniCPM4-8B - GGUF
+This repository contains GGUF quantizations of [openbmb/MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B).
+## About GGUF
+GGUF is a quantization method that allows you to run large language models on consumer hardware by reducing the precision of the model weights.
+## Files
+| Filename | Quant type | File Size | Description |
+| -------- | ---------- | --------- | ----------- |
+| model-f16.gguf | f16 | Large | Original precision |
+| model-q4_0.gguf | Q4_0 | Small | 4-bit quantization |
+| model-q4_1.gguf | Q4_1 | Small | 4-bit quantization (higher quality) |
+| model-q5_0.gguf | Q5_0 | Medium | 5-bit quantization |
+| model-q5_1.gguf | Q5_1 | Medium | 5-bit quantization (higher quality) |
+| model-q8_0.gguf | Q8_0 | Large | 8-bit quantization |
+## Usage
+You can use these models with llama.cpp or any other GGUF-compatible inference engine.
+### llama.cpp
+```bash
+./llama-cli -m model-q4_0.gguf -p "Your prompt here"
+```
+### Python (using llama-cpp-python)
+```python
+from llama_cpp import Llama
+llm = Llama(model_path="model-q4_0.gguf")
+output = llm("Your prompt here", max_tokens=512)
+print(output['choices'][0]['text'])
+```
+## Original Model
+This is a quantized version of [openbmb/MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B). Please refer to the original model card for more information about the model's capabilities, training data, and usage guidelines.
+## Conversion Details
+- Converted using llama.cpp
+- Original model downloaded from Hugging Face
+- Multiple quantization levels provided for different use cases
+## License
+This model inherits the license from the original model. Please check the original model's license for usage terms.