Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, and auto-upload
Browse files- .gitattributes +9 -0
- .prepare_and_upload_qwen3-0.6B.sh.swp +0 -0
- MODELFILE +25 -0
- Qwen3-1.7B-Q2_K/README.md +92 -0
- Qwen3-1.7B-Q3_K_M/README.md +92 -0
- Qwen3-1.7B-Q3_K_S/README.md +92 -0
- Qwen3-1.7B-Q4_K_M/README.md +92 -0
- Qwen3-1.7B-Q4_K_S/README.md +92 -0
- Qwen3-1.7B-Q5_K_M/README.md +92 -0
- Qwen3-1.7B-Q5_K_S/README.md +92 -0
- Qwen3-1.7B-Q6_K/README.md +92 -0
- Qwen3-1.7B-Q8_0/README.md +92 -0
- Qwen3-1.7B-f16:Q2_K.gguf +3 -0
- Qwen3-1.7B-f16:Q3_K_M.gguf +3 -0
- Qwen3-1.7B-f16:Q3_K_S.gguf +3 -0
- Qwen3-1.7B-f16:Q4_K_M.gguf +3 -0
- Qwen3-1.7B-f16:Q4_K_S.gguf +3 -0
- Qwen3-1.7B-f16:Q5_K_M.gguf +3 -0
- Qwen3-1.7B-f16:Q5_K_S.gguf +3 -0
- Qwen3-1.7B-f16:Q6_K.gguf +3 -0
- Qwen3-1.7B-f16:Q8_0.gguf +3 -0
- README.md +75 -0
- SHA256SUMS.txt +9 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
Qwen3-1.7B-f16:Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
Qwen3-1.7B-f16:Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
Qwen3-1.7B-f16:Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
Qwen3-1.7B-f16:Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
Qwen3-1.7B-f16:Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
Qwen3-1.7B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
Qwen3-1.7B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
Qwen3-1.7B-f16:Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
Qwen3-1.7B-f16:Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
.prepare_and_upload_qwen3-0.6B.sh.swp
ADDED
|
Binary file (1.02 kB). View file
|
|
|
MODELFILE
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MODELFILE for Qwen3-1.7B-GGUF
|
| 2 |
+
# Used by LM Studio, OpenWebUI, GPT4All, etc.
|
| 3 |
+
|
| 4 |
+
context_length: 32768
|
| 5 |
+
embedding: false
|
| 6 |
+
f16: cpu
|
| 7 |
+
|
| 8 |
+
# Chat template using ChatML (used by Qwen)
|
| 9 |
+
prompt_template: >-
|
| 10 |
+
<|im_start|>system
|
| 11 |
+
You are a helpful assistant.<|im_end|>
|
| 12 |
+
<|im_start|>user
|
| 13 |
+
{prompt}<|im_end|>
|
| 14 |
+
<|im_start|>assistant
|
| 15 |
+
|
| 16 |
+
# Stop sequences help end generation cleanly
|
| 17 |
+
stop: "<|im_end|>"
|
| 18 |
+
stop: "<|im_start|>"
|
| 19 |
+
|
| 20 |
+
# Default sampling
|
| 21 |
+
temperature: 0.6
|
| 22 |
+
top_p: 0.95
|
| 23 |
+
top_k: 20
|
| 24 |
+
min_p: 0.0
|
| 25 |
+
repeat_penalty: 1.1
|
Qwen3-1.7B-Q2_K/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-Q3_K_M/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-Q3_K_S/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-Q4_K_M/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-Q4_K_S/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-Q5_K_M/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-Q5_K_S/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-Q6_K/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-Q8_0/README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
base_model: $BASE_REPO
|
| 11 |
+
author: geoffmunn
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ${MODEL_NAME}-${QTYPE}
|
| 15 |
+
|
| 16 |
+
Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
|
| 17 |
+
|
| 18 |
+
## Model Info
|
| 19 |
+
|
| 20 |
+
- **Format**: GGUF (for llama.cpp and compatible runtimes)
|
| 21 |
+
- **Size**: ${FILE_SIZE}
|
| 22 |
+
- **Precision**: ${QTYPE}
|
| 23 |
+
- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
|
| 24 |
+
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
| 25 |
+
|
| 26 |
+
## Quality & Performance
|
| 27 |
+
|
| 28 |
+
| Metric | Value |
|
| 29 |
+
|-------|-------|
|
| 30 |
+
| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
|
| 31 |
+
| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
|
| 32 |
+
| **RAM Required** | $(case $QTYPE in
|
| 33 |
+
Q2_K) echo "~0.9 GB" ;;
|
| 34 |
+
Q3_K_S) echo "~1.1 GB" ;;
|
| 35 |
+
Q3_K_M) echo "~1.3 GB" ;;
|
| 36 |
+
Q4_K_S) echo "~1.4 GB" ;;
|
| 37 |
+
Q4_K_M) echo "~1.5 GB" ;;
|
| 38 |
+
Q5_K_S) echo "~1.6 GB" ;;
|
| 39 |
+
Q5_K_M) echo "~1.7 GB" ;;
|
| 40 |
+
Q6_K) echo "~2.0 GB" ;;
|
| 41 |
+
Q8_0) echo "~2.3 GB" ;;
|
| 42 |
+
*) echo "~? GB" ;;
|
| 43 |
+
esac) |
|
| 44 |
+
| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
|
| 45 |
+
|
| 46 |
+
## Prompt Template (ChatML)
|
| 47 |
+
|
| 48 |
+
This model uses the **ChatML** format used by Qwen:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
<|im_start|>system
|
| 52 |
+
You are a helpful assistant.<|im_end|>
|
| 53 |
+
<|im_start|>user
|
| 54 |
+
{prompt}<|im_end|>
|
| 55 |
+
<|im_start|>assistant
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
|
| 59 |
+
|
| 60 |
+
## Generation Parameters
|
| 61 |
+
|
| 62 |
+
Recommended defaults:
|
| 63 |
+
|
| 64 |
+
| Parameter | Value |
|
| 65 |
+
|---------|-------|
|
| 66 |
+
| Temperature | 0.6 |
|
| 67 |
+
| Top-P | 0.95 |
|
| 68 |
+
| Top-K | 20 |
|
| 69 |
+
| Min-P | 0.0 |
|
| 70 |
+
| Repeat Penalty | 1.1 |
|
| 71 |
+
|
| 72 |
+
Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
|
| 73 |
+
|
| 74 |
+
## Verification
|
| 75 |
+
|
| 76 |
+
Check integrity:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
sha256sum -c ../SHA256SUMS.txt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Usage
|
| 83 |
+
|
| 84 |
+
Compatible with:
|
| 85 |
+
- [LM Studio](https://lmstudio.ai)
|
| 86 |
+
- [OpenWebUI](https://openwebui.com)
|
| 87 |
+
- [GPT4All](https://gpt4all.io)
|
| 88 |
+
- Directly via llama.cpp
|
| 89 |
+
|
| 90 |
+
## License
|
| 91 |
+
|
| 92 |
+
Apache 2.0 – see base model for full terms.
|
Qwen3-1.7B-f16:Q2_K.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8111eca8604b050a55a8c370693b6dfc14cb4dd283bbdba86612a423686eb350
|
| 3 |
+
size 879896768
|
Qwen3-1.7B-f16:Q3_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f90c4aba543ee3d50c962574cf1d9daa58d6dc0a1f59fdeee5564eaad71b02b3
|
| 3 |
+
size 1073242304
|
Qwen3-1.7B-f16:Q3_K_S.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a53a624119885c54c2cd7592e2e7d45b6992c35a8b585cff698e3017830d9517
|
| 3 |
+
size 1000956096
|
Qwen3-1.7B-f16:Q4_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b678ed9ba30b2cd7b9546e0a16dd3e28e5d9986b3b2dfe0dc7442176d4f44015
|
| 3 |
+
size 1282439360
|
Qwen3-1.7B-f16:Q4_K_S.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f22681a7f8ad52f933888ddd57d0cde806af99c0bfb9ad7cbaf3ccfa2127ee4b
|
| 3 |
+
size 1235220672
|
Qwen3-1.7B-f16:Q5_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:20d8d6dfb5d25b4b372e3a23e7da6727697dbfc6eb885d363b1dfcbced645759
|
| 3 |
+
size 1471805632
|
Qwen3-1.7B-f16:Q5_K_S.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8c1d0728d84cdbdc8e7dd862697b3870b2f3ff820145533ddc8a9ac01d411c6e
|
| 3 |
+
size 1444509888
|
Qwen3-1.7B-f16:Q6_K.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7ccfe899d107d9517170c720087bf72f39fa284a1e51ace561221fda50b37956
|
| 3 |
+
size 1673007296
|
Qwen3-1.7B-f16:Q8_0.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:80266a1383b16459eee9f267f01eade72affad3bf4fb28f4c1705d66d9bd7222
|
| 3 |
+
size 2165039296
|
README.md
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- gguf
|
| 5 |
+
- qwen
|
| 6 |
+
- llama.cpp
|
| 7 |
+
- quantized
|
| 8 |
+
- text-generation
|
| 9 |
+
- chat
|
| 10 |
+
- reasoning
|
| 11 |
+
base_model: Qwen/Qwen3-1.7B
|
| 12 |
+
author: geoffmunn
|
| 13 |
+
pipeline_tag: text-generation
|
| 14 |
+
language:
|
| 15 |
+
- en
|
| 16 |
+
- zh
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# Qwen3-1.7B-GGUF
|
| 20 |
+
|
| 21 |
+
This is a **GGUF-quantized version** of the **[Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)** language model, converted for use with `llama.cpp` and compatible inference engines such as OpenWebUI, LM Studio, and GPT4All.
|
| 22 |
+
|
| 23 |
+
The **Qwen3-1.7B** model is a lightweight yet capable LLM ideal for local deployment on consumer hardware. It balances speed and quality for everyday tasks like casual conversation, summarization, code snippets, and personal AI assistance — all while running fully offline.
|
| 24 |
+
|
| 25 |
+
## Available Quantizations (from f16)
|
| 26 |
+
|
| 27 |
+
These variants were built from a **f16** base model to ensure consistency across quant levels.
|
| 28 |
+
|
| 29 |
+
| Level | Quality | Speed | Size Est. | Recommendation |
|
| 30 |
+
|----------|--------------|----------|-----------|----------------|
|
| 31 |
+
| Q2_K | Very Low | ⚡ Fastest | ~0.9 GB | Only on very weak devices; avoid for reasoning. |
|
| 32 |
+
| Q3_K_S | Low | ⚡ Fast | ~1.1 GB | Minimal viability; basic completion only. |
|
| 33 |
+
| Q3_K_M | Low-Medium | ⚡ Fast | ~1.3 GB | Acceptable for simple chat on older systems. |
|
| 34 |
+
| Q4_K_S | Medium | 🚀 Fast | ~1.4 GB | Good balance for low-end laptops or Mac Minis. |
|
| 35 |
+
| Q4_K_M | ✅ Balanced | 🚀 Fast | ~1.5 GB | Best overall for general use on average hardware. |
|
| 36 |
+
| Q5_K_S | High | 🐢 Medium | ~1.6 GB | Better reasoning; slightly faster than Q5_K_M. |
|
| 37 |
+
| Q5_K_M | ✅✅ High | 🐢 Medium | ~1.7 GB | Top pick for coding, logic, and deeper interactions. |
|
| 38 |
+
| Q6_K | 🔥 Near-FP16 | 🐌 Slow | ~2.0 GB | Excellent fidelity; great for RAG and retrieval. |
|
| 39 |
+
| Q8_0 | 🏆 Lossless* | 🐌 Slow | ~2.3 GB | Maximum accuracy; recommended when precision matters most. |
|
| 40 |
+
|
| 41 |
+
> 💡 **Recommendations by Use Case**
|
| 42 |
+
>
|
| 43 |
+
> - 💻 **Low-end CPU / Raspberry Pi / Old Laptop**: `Q4_K_M`
|
| 44 |
+
> - 🖥️ **Standard Laptop (Intel i5/M1 Mac)**: `Q5_K_M` (optimal balance)
|
| 45 |
+
> - 🧠 **Reasoning, Coding, Math**: `Q5_K_M` or `Q6_K`
|
| 46 |
+
> - 🔍 **RAG, Retrieval, Precision Tasks**: `Q6_K` or `Q8_0`
|
| 47 |
+
> - 📦 **Storage-Constrained Devices**: `Q4_K_S` or `Q4_K_M`
|
| 48 |
+
> - 🛠️ **Development & Testing**: Test from `Q4_K_M` up to `Q8_0` for robustness.
|
| 49 |
+
|
| 50 |
+
## Usage
|
| 51 |
+
|
| 52 |
+
Load this model using:
|
| 53 |
+
- [OpenWebUI](https://openwebui.com)
|
| 54 |
+
- [LM Studio](https://lmstudio.ai)
|
| 55 |
+
- [GPT4All](https://gpt4all.io)
|
| 56 |
+
- Or directly via \`llama.cpp\`
|
| 57 |
+
|
| 58 |
+
Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
|
| 59 |
+
|
| 60 |
+
## Verification
|
| 61 |
+
|
| 62 |
+
Use \`SHA256SUMS.txt\` to verify file integrity:
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
sha256sum -c SHA256SUMS.txt
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## Author
|
| 69 |
+
|
| 70 |
+
👤 Geoff Munn (@geoffmunn)
|
| 71 |
+
🔗 [Hugging Face Profile](https://huggingface.co/geoffmunn)
|
| 72 |
+
|
| 73 |
+
## Disclaimer
|
| 74 |
+
|
| 75 |
+
This is a community conversion for local inference. Not affiliated with Alibaba Cloud or the Qwen team.
|
SHA256SUMS.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
8111eca8604b050a55a8c370693b6dfc14cb4dd283bbdba86612a423686eb350 Qwen3-1.7B-f16:Q2_K.gguf
|
| 2 |
+
f90c4aba543ee3d50c962574cf1d9daa58d6dc0a1f59fdeee5564eaad71b02b3 Qwen3-1.7B-f16:Q3_K_M.gguf
|
| 3 |
+
a53a624119885c54c2cd7592e2e7d45b6992c35a8b585cff698e3017830d9517 Qwen3-1.7B-f16:Q3_K_S.gguf
|
| 4 |
+
b678ed9ba30b2cd7b9546e0a16dd3e28e5d9986b3b2dfe0dc7442176d4f44015 Qwen3-1.7B-f16:Q4_K_M.gguf
|
| 5 |
+
f22681a7f8ad52f933888ddd57d0cde806af99c0bfb9ad7cbaf3ccfa2127ee4b Qwen3-1.7B-f16:Q4_K_S.gguf
|
| 6 |
+
20d8d6dfb5d25b4b372e3a23e7da6727697dbfc6eb885d363b1dfcbced645759 Qwen3-1.7B-f16:Q5_K_M.gguf
|
| 7 |
+
8c1d0728d84cdbdc8e7dd862697b3870b2f3ff820145533ddc8a9ac01d411c6e Qwen3-1.7B-f16:Q5_K_S.gguf
|
| 8 |
+
7ccfe899d107d9517170c720087bf72f39fa284a1e51ace561221fda50b37956 Qwen3-1.7B-f16:Q6_K.gguf
|
| 9 |
+
80266a1383b16459eee9f267f01eade72affad3bf4fb28f4c1705d66d9bd7222 Qwen3-1.7B-f16:Q8_0.gguf
|