geoffmunn commited on Sep 20

Commit

6d6402f

verified ·

1 Parent(s): 7aebd5c

Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, and auto-upload

Browse files

Files changed (23) hide show

.gitattributes +9 -0
.prepare_and_upload_qwen3-0.6B.sh.swp +0 -0
MODELFILE +25 -0
Qwen3-1.7B-Q2_K/README.md +92 -0
Qwen3-1.7B-Q3_K_M/README.md +92 -0
Qwen3-1.7B-Q3_K_S/README.md +92 -0
Qwen3-1.7B-Q4_K_M/README.md +92 -0
Qwen3-1.7B-Q4_K_S/README.md +92 -0
Qwen3-1.7B-Q5_K_M/README.md +92 -0
Qwen3-1.7B-Q5_K_S/README.md +92 -0
Qwen3-1.7B-Q6_K/README.md +92 -0
Qwen3-1.7B-Q8_0/README.md +92 -0
Qwen3-1.7B-f16:Q2_K.gguf +3 -0
Qwen3-1.7B-f16:Q3_K_M.gguf +3 -0
Qwen3-1.7B-f16:Q3_K_S.gguf +3 -0
Qwen3-1.7B-f16:Q4_K_M.gguf +3 -0
Qwen3-1.7B-f16:Q4_K_S.gguf +3 -0
Qwen3-1.7B-f16:Q5_K_M.gguf +3 -0
Qwen3-1.7B-f16:Q5_K_S.gguf +3 -0
Qwen3-1.7B-f16:Q6_K.gguf +3 -0
Qwen3-1.7B-f16:Q8_0.gguf +3 -0
README.md +75 -0
SHA256SUMS.txt +9 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-f16:Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

.prepare_and_upload_qwen3-0.6B.sh.swp ADDED Viewed

Binary file (1.02 kB). View file

MODELFILE ADDED Viewed

	@@ -0,0 +1,25 @@

+# MODELFILE for Qwen3-1.7B-GGUF
+# Used by LM Studio, OpenWebUI, GPT4All, etc.
+context_length: 32768
+embedding: false
+f16: cpu
+# Chat template using ChatML (used by Qwen)
+prompt_template: >-
+      <|im_start|>system
+  You are a helpful assistant.<|im_end|>
+      <|im_start|>user
+  {prompt}<|im_end|>
+      <|im_start|>assistant
+# Stop sequences help end generation cleanly
+stop: "<|im_end|>"
+stop: "<|im_start|>"
+# Default sampling
+temperature: 0.6
+top_p: 0.95
+top_k: 20
+min_p: 0.0
+repeat_penalty: 1.1

Qwen3-1.7B-Q2_K/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-Q3_K_M/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-Q3_K_S/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-Q4_K_M/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-Q4_K_S/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-Q5_K_M/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-Q5_K_S/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-Q6_K/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-Q8_0/README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+base_model: $BASE_REPO
+author: geoffmunn
+---
+# ${MODEL_NAME}-${QTYPE}
+Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: ${FILE_SIZE}
+- **Precision**: ${QTYPE}
+- **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric | Value |
+|-------|-------|
+| **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
+| **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
+| **RAM Required** | $(case $QTYPE in
+    Q2_K) echo "~0.9 GB" ;;
+    Q3_K_S) echo "~1.1 GB" ;;
+    Q3_K_M) echo "~1.3 GB" ;;
+    Q4_K_S) echo "~1.4 GB" ;;
+    Q4_K_M) echo "~1.5 GB" ;;
+    Q5_K_S) echo "~1.6 GB" ;;
+    Q5_K_M) echo "~1.7 GB" ;;
+    Q6_K) echo "~2.0 GB" ;;
+    Q8_0) echo "~2.3 GB" ;;
+    *) echo "~? GB" ;;
+esac) |
+| **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter | Value |
+|---------|-------|
+| Temperature | 0.6 |
+| Top-P | 0.95 |
+| Top-K | 20 |
+| Min-P | 0.0 |
+| Repeat Penalty | 1.1 |
+Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai)
+- [OpenWebUI](https://openwebui.com)
+- [GPT4All](https://gpt4all.io)
+- Directly via llama.cpp
+## License
+Apache 2.0 – see base model for full terms.

Qwen3-1.7B-f16:Q2_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8111eca8604b050a55a8c370693b6dfc14cb4dd283bbdba86612a423686eb350
+size 879896768

Qwen3-1.7B-f16:Q3_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f90c4aba543ee3d50c962574cf1d9daa58d6dc0a1f59fdeee5564eaad71b02b3
+size 1073242304

Qwen3-1.7B-f16:Q3_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a53a624119885c54c2cd7592e2e7d45b6992c35a8b585cff698e3017830d9517
+size 1000956096

Qwen3-1.7B-f16:Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b678ed9ba30b2cd7b9546e0a16dd3e28e5d9986b3b2dfe0dc7442176d4f44015
+size 1282439360

Qwen3-1.7B-f16:Q4_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f22681a7f8ad52f933888ddd57d0cde806af99c0bfb9ad7cbaf3ccfa2127ee4b
+size 1235220672

Qwen3-1.7B-f16:Q5_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:20d8d6dfb5d25b4b372e3a23e7da6727697dbfc6eb885d363b1dfcbced645759
+size 1471805632

Qwen3-1.7B-f16:Q5_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c1d0728d84cdbdc8e7dd862697b3870b2f3ff820145533ddc8a9ac01d411c6e
+size 1444509888

Qwen3-1.7B-f16:Q6_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ccfe899d107d9517170c720087bf72f39fa284a1e51ace561221fda50b37956
+size 1673007296

Qwen3-1.7B-f16:Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80266a1383b16459eee9f267f01eade72affad3bf4fb28f4c1705d66d9bd7222
+size 2165039296

README.md ADDED Viewed

	@@ -0,0 +1,75 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+  - reasoning
+base_model: Qwen/Qwen3-1.7B
+author: geoffmunn
+pipeline_tag: text-generation
+language:
+  - en
+  - zh
+---
+# Qwen3-1.7B-GGUF
+This is a **GGUF-quantized version** of the **[Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)** language model, converted for use with `llama.cpp` and compatible inference engines such as OpenWebUI, LM Studio, and GPT4All.
+The **Qwen3-1.7B** model is a lightweight yet capable LLM ideal for local deployment on consumer hardware. It balances speed and quality for everyday tasks like casual conversation, summarization, code snippets, and personal AI assistance — all while running fully offline.
+## Available Quantizations (from f16)
+These variants were built from a **f16** base model to ensure consistency across quant levels.
+| Level     | Quality       | Speed     | Size Est. | Recommendation |
+|----------|--------------|----------|-----------|----------------|
+| Q2_K     | Very Low     | ⚡ Fastest | ~0.9 GB   | Only on very weak devices; avoid for reasoning. |
+| Q3_K_S   | Low          | ⚡ Fast    | ~1.1 GB   | Minimal viability; basic completion only. |
+| Q3_K_M   | Low-Medium   | ⚡ Fast    | ~1.3 GB   | Acceptable for simple chat on older systems. |
+| Q4_K_S   | Medium       | 🚀 Fast    | ~1.4 GB   | Good balance for low-end laptops or Mac Minis. |
+| Q4_K_M   | ✅ Balanced   | 🚀 Fast    | ~1.5 GB   | Best overall for general use on average hardware. |
+| Q5_K_S   | High         | 🐢 Medium  | ~1.6 GB   | Better reasoning; slightly faster than Q5_K_M. |
+| Q5_K_M   | ✅✅ High     | 🐢 Medium  | ~1.7 GB   | Top pick for coding, logic, and deeper interactions. |
+| Q6_K     | 🔥 Near-FP16 | 🐌 Slow    | ~2.0 GB   | Excellent fidelity; great for RAG and retrieval. |
+| Q8_0     | 🏆 Lossless*  | 🐌 Slow    | ~2.3 GB   | Maximum accuracy; recommended when precision matters most. |
+> 💡 **Recommendations by Use Case**
+>
+> - 💻 **Low-end CPU / Raspberry Pi / Old Laptop**: `Q4_K_M`
+> - 🖥️ **Standard Laptop (Intel i5/M1 Mac)**: `Q5_K_M` (optimal balance)
+> - 🧠 **Reasoning, Coding, Math**: `Q5_K_M` or `Q6_K`
+> - 🔍 **RAG, Retrieval, Precision Tasks**: `Q6_K` or `Q8_0`
+> - 📦 **Storage-Constrained Devices**: `Q4_K_S` or `Q4_K_M`
+> - 🛠️ **Development & Testing**: Test from `Q4_K_M` up to `Q8_0` for robustness.
+## Usage
+Load this model using:
+- [OpenWebUI](https://openwebui.com)
+- [LM Studio](https://lmstudio.ai)
+- [GPT4All](https://gpt4all.io)
+- Or directly via \`llama.cpp\`
+Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
+## Verification
+Use \`SHA256SUMS.txt\` to verify file integrity:
+```bash
+sha256sum -c SHA256SUMS.txt
+```
+## Author
+👤 Geoff Munn (@geoffmunn)
+🔗 [Hugging Face Profile](https://huggingface.co/geoffmunn)
+## Disclaimer
+This is a community conversion for local inference. Not affiliated with Alibaba Cloud or the Qwen team.

SHA256SUMS.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+8111eca8604b050a55a8c370693b6dfc14cb4dd283bbdba86612a423686eb350  Qwen3-1.7B-f16:Q2_K.gguf
+f90c4aba543ee3d50c962574cf1d9daa58d6dc0a1f59fdeee5564eaad71b02b3  Qwen3-1.7B-f16:Q3_K_M.gguf
+a53a624119885c54c2cd7592e2e7d45b6992c35a8b585cff698e3017830d9517  Qwen3-1.7B-f16:Q3_K_S.gguf
+b678ed9ba30b2cd7b9546e0a16dd3e28e5d9986b3b2dfe0dc7442176d4f44015  Qwen3-1.7B-f16:Q4_K_M.gguf
+f22681a7f8ad52f933888ddd57d0cde806af99c0bfb9ad7cbaf3ccfa2127ee4b  Qwen3-1.7B-f16:Q4_K_S.gguf
+20d8d6dfb5d25b4b372e3a23e7da6727697dbfc6eb885d363b1dfcbced645759  Qwen3-1.7B-f16:Q5_K_M.gguf
+8c1d0728d84cdbdc8e7dd862697b3870b2f3ff820145533ddc8a9ac01d411c6e  Qwen3-1.7B-f16:Q5_K_S.gguf
+7ccfe899d107d9517170c720087bf72f39fa284a1e51ace561221fda50b37956  Qwen3-1.7B-f16:Q6_K.gguf
+80266a1383b16459eee9f267f01eade72affad3bf4fb28f4c1705d66d9bd7222  Qwen3-1.7B-f16:Q8_0.gguf