geoffmunn commited on
Commit
6d6402f
·
verified ·
1 Parent(s): 7aebd5c

Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, and auto-upload

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Qwen3-1.7B-f16:Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Qwen3-1.7B-f16:Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Qwen3-1.7B-f16:Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Qwen3-1.7B-f16:Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
40
+ Qwen3-1.7B-f16:Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
41
+ Qwen3-1.7B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
42
+ Qwen3-1.7B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
43
+ Qwen3-1.7B-f16:Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
44
+ Qwen3-1.7B-f16:Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
.prepare_and_upload_qwen3-0.6B.sh.swp ADDED
Binary file (1.02 kB). View file
 
MODELFILE ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MODELFILE for Qwen3-1.7B-GGUF
2
+ # Used by LM Studio, OpenWebUI, GPT4All, etc.
3
+
4
+ context_length: 32768
5
+ embedding: false
6
+ f16: cpu
7
+
8
+ # Chat template using ChatML (used by Qwen)
9
+ prompt_template: >-
10
+ <|im_start|>system
11
+ You are a helpful assistant.<|im_end|>
12
+ <|im_start|>user
13
+ {prompt}<|im_end|>
14
+ <|im_start|>assistant
15
+
16
+ # Stop sequences help end generation cleanly
17
+ stop: "<|im_end|>"
18
+ stop: "<|im_start|>"
19
+
20
+ # Default sampling
21
+ temperature: 0.6
22
+ top_p: 0.95
23
+ top_k: 20
24
+ min_p: 0.0
25
+ repeat_penalty: 1.1
Qwen3-1.7B-Q2_K/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-Q3_K_M/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-Q3_K_S/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-Q4_K_M/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-Q4_K_S/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-Q5_K_M/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-Q5_K_S/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-Q6_K/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-Q8_0/README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ base_model: $BASE_REPO
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # ${MODEL_NAME}-${QTYPE}
15
+
16
+ Quantized version of [${BASE_REPO}](https://huggingface.co/${BASE_REPO}) at **${QTYPE}** level, derived from **${INPUT_PRECISION}** base weights.
17
+
18
+ ## Model Info
19
+
20
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
21
+ - **Size**: ${FILE_SIZE}
22
+ - **Precision**: ${QTYPE}
23
+ - **Base Model**: [${BASE_REPO}](https://huggingface.co/${BASE_REPO})
24
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
25
+
26
+ ## Quality & Performance
27
+
28
+ | Metric | Value |
29
+ |-------|-------|
30
+ | **Quality** | $(echo "${RECOMMENDATIONS[$QTYPE]}" | cut -d';' -f1 | xargs) |
31
+ | **Speed** | $(if [[ "$QTYPE" =~ ^(Q2_K|Q3_K_S|Q3_K_M|Q4_K_S|Q4_K_M)$ ]]; then echo "🚀 Fast"; elif [[ "$QTYPE" =~ ^(Q5_K_S|Q5_K_M|Q6_K|Q8_0)$ ]]; then echo "🐢 Medium"; else echo "🐌 Slow"; fi) |
32
+ | **RAM Required** | $(case $QTYPE in
33
+ Q2_K) echo "~0.9 GB" ;;
34
+ Q3_K_S) echo "~1.1 GB" ;;
35
+ Q3_K_M) echo "~1.3 GB" ;;
36
+ Q4_K_S) echo "~1.4 GB" ;;
37
+ Q4_K_M) echo "~1.5 GB" ;;
38
+ Q5_K_S) echo "~1.6 GB" ;;
39
+ Q5_K_M) echo "~1.7 GB" ;;
40
+ Q6_K) echo "~2.0 GB" ;;
41
+ Q8_0) echo "~2.3 GB" ;;
42
+ *) echo "~? GB" ;;
43
+ esac) |
44
+ | **Recommendation** | ${RECOMMENDATIONS[$QTYPE]} |
45
+
46
+ ## Prompt Template (ChatML)
47
+
48
+ This model uses the **ChatML** format used by Qwen:
49
+
50
+ ```text
51
+ <|im_start|>system
52
+ You are a helpful assistant.<|im_end|>
53
+ <|im_start|>user
54
+ {prompt}<|im_end|>
55
+ <|im_start|>assistant
56
+ ```
57
+
58
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
59
+
60
+ ## Generation Parameters
61
+
62
+ Recommended defaults:
63
+
64
+ | Parameter | Value |
65
+ |---------|-------|
66
+ | Temperature | 0.6 |
67
+ | Top-P | 0.95 |
68
+ | Top-K | 20 |
69
+ | Min-P | 0.0 |
70
+ | Repeat Penalty | 1.1 |
71
+
72
+ Stop sequences: \`<|im_end|>\`, \`<|im_start|>\`
73
+
74
+ ## Verification
75
+
76
+ Check integrity:
77
+
78
+ ```bash
79
+ sha256sum -c ../SHA256SUMS.txt
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ Compatible with:
85
+ - [LM Studio](https://lmstudio.ai)
86
+ - [OpenWebUI](https://openwebui.com)
87
+ - [GPT4All](https://gpt4all.io)
88
+ - Directly via llama.cpp
89
+
90
+ ## License
91
+
92
+ Apache 2.0 – see base model for full terms.
Qwen3-1.7B-f16:Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8111eca8604b050a55a8c370693b6dfc14cb4dd283bbdba86612a423686eb350
3
+ size 879896768
Qwen3-1.7B-f16:Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f90c4aba543ee3d50c962574cf1d9daa58d6dc0a1f59fdeee5564eaad71b02b3
3
+ size 1073242304
Qwen3-1.7B-f16:Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a53a624119885c54c2cd7592e2e7d45b6992c35a8b585cff698e3017830d9517
3
+ size 1000956096
Qwen3-1.7B-f16:Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b678ed9ba30b2cd7b9546e0a16dd3e28e5d9986b3b2dfe0dc7442176d4f44015
3
+ size 1282439360
Qwen3-1.7B-f16:Q4_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f22681a7f8ad52f933888ddd57d0cde806af99c0bfb9ad7cbaf3ccfa2127ee4b
3
+ size 1235220672
Qwen3-1.7B-f16:Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20d8d6dfb5d25b4b372e3a23e7da6727697dbfc6eb885d363b1dfcbced645759
3
+ size 1471805632
Qwen3-1.7B-f16:Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c1d0728d84cdbdc8e7dd862697b3870b2f3ff820145533ddc8a9ac01d411c6e
3
+ size 1444509888
Qwen3-1.7B-f16:Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ccfe899d107d9517170c720087bf72f39fa284a1e51ace561221fda50b37956
3
+ size 1673007296
Qwen3-1.7B-f16:Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80266a1383b16459eee9f267f01eade72affad3bf4fb28f4c1705d66d9bd7222
3
+ size 2165039296
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - chat
10
+ - reasoning
11
+ base_model: Qwen/Qwen3-1.7B
12
+ author: geoffmunn
13
+ pipeline_tag: text-generation
14
+ language:
15
+ - en
16
+ - zh
17
+ ---
18
+
19
+ # Qwen3-1.7B-GGUF
20
+
21
+ This is a **GGUF-quantized version** of the **[Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)** language model, converted for use with `llama.cpp` and compatible inference engines such as OpenWebUI, LM Studio, and GPT4All.
22
+
23
+ The **Qwen3-1.7B** model is a lightweight yet capable LLM ideal for local deployment on consumer hardware. It balances speed and quality for everyday tasks like casual conversation, summarization, code snippets, and personal AI assistance — all while running fully offline.
24
+
25
+ ## Available Quantizations (from f16)
26
+
27
+ These variants were built from a **f16** base model to ensure consistency across quant levels.
28
+
29
+ | Level | Quality | Speed | Size Est. | Recommendation |
30
+ |----------|--------------|----------|-----------|----------------|
31
+ | Q2_K | Very Low | ⚡ Fastest | ~0.9 GB | Only on very weak devices; avoid for reasoning. |
32
+ | Q3_K_S | Low | ⚡ Fast | ~1.1 GB | Minimal viability; basic completion only. |
33
+ | Q3_K_M | Low-Medium | ⚡ Fast | ~1.3 GB | Acceptable for simple chat on older systems. |
34
+ | Q4_K_S | Medium | 🚀 Fast | ~1.4 GB | Good balance for low-end laptops or Mac Minis. |
35
+ | Q4_K_M | ✅ Balanced | 🚀 Fast | ~1.5 GB | Best overall for general use on average hardware. |
36
+ | Q5_K_S | High | 🐢 Medium | ~1.6 GB | Better reasoning; slightly faster than Q5_K_M. |
37
+ | Q5_K_M | ✅✅ High | 🐢 Medium | ~1.7 GB | Top pick for coding, logic, and deeper interactions. |
38
+ | Q6_K | 🔥 Near-FP16 | 🐌 Slow | ~2.0 GB | Excellent fidelity; great for RAG and retrieval. |
39
+ | Q8_0 | 🏆 Lossless* | 🐌 Slow | ~2.3 GB | Maximum accuracy; recommended when precision matters most. |
40
+
41
+ > 💡 **Recommendations by Use Case**
42
+ >
43
+ > - 💻 **Low-end CPU / Raspberry Pi / Old Laptop**: `Q4_K_M`
44
+ > - 🖥️ **Standard Laptop (Intel i5/M1 Mac)**: `Q5_K_M` (optimal balance)
45
+ > - 🧠 **Reasoning, Coding, Math**: `Q5_K_M` or `Q6_K`
46
+ > - 🔍 **RAG, Retrieval, Precision Tasks**: `Q6_K` or `Q8_0`
47
+ > - 📦 **Storage-Constrained Devices**: `Q4_K_S` or `Q4_K_M`
48
+ > - 🛠️ **Development & Testing**: Test from `Q4_K_M` up to `Q8_0` for robustness.
49
+
50
+ ## Usage
51
+
52
+ Load this model using:
53
+ - [OpenWebUI](https://openwebui.com)
54
+ - [LM Studio](https://lmstudio.ai)
55
+ - [GPT4All](https://gpt4all.io)
56
+ - Or directly via \`llama.cpp\`
57
+
58
+ Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
59
+
60
+ ## Verification
61
+
62
+ Use \`SHA256SUMS.txt\` to verify file integrity:
63
+
64
+ ```bash
65
+ sha256sum -c SHA256SUMS.txt
66
+ ```
67
+
68
+ ## Author
69
+
70
+ 👤 Geoff Munn (@geoffmunn)
71
+ 🔗 [Hugging Face Profile](https://huggingface.co/geoffmunn)
72
+
73
+ ## Disclaimer
74
+
75
+ This is a community conversion for local inference. Not affiliated with Alibaba Cloud or the Qwen team.
SHA256SUMS.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ 8111eca8604b050a55a8c370693b6dfc14cb4dd283bbdba86612a423686eb350 Qwen3-1.7B-f16:Q2_K.gguf
2
+ f90c4aba543ee3d50c962574cf1d9daa58d6dc0a1f59fdeee5564eaad71b02b3 Qwen3-1.7B-f16:Q3_K_M.gguf
3
+ a53a624119885c54c2cd7592e2e7d45b6992c35a8b585cff698e3017830d9517 Qwen3-1.7B-f16:Q3_K_S.gguf
4
+ b678ed9ba30b2cd7b9546e0a16dd3e28e5d9986b3b2dfe0dc7442176d4f44015 Qwen3-1.7B-f16:Q4_K_M.gguf
5
+ f22681a7f8ad52f933888ddd57d0cde806af99c0bfb9ad7cbaf3ccfa2127ee4b Qwen3-1.7B-f16:Q4_K_S.gguf
6
+ 20d8d6dfb5d25b4b372e3a23e7da6727697dbfc6eb885d363b1dfcbced645759 Qwen3-1.7B-f16:Q5_K_M.gguf
7
+ 8c1d0728d84cdbdc8e7dd862697b3870b2f3ff820145533ddc8a9ac01d411c6e Qwen3-1.7B-f16:Q5_K_S.gguf
8
+ 7ccfe899d107d9517170c720087bf72f39fa284a1e51ace561221fda50b37956 Qwen3-1.7B-f16:Q6_K.gguf
9
+ 80266a1383b16459eee9f267f01eade72affad3bf4fb28f4c1705d66d9bd7222 Qwen3-1.7B-f16:Q8_0.gguf