ubergarm commited on
Commit
86255df
Β·
1 Parent(s): 3ed8bb9

initial commit

Browse files
Files changed (2) hide show
  1. .gitattributes +3 -0
  2. README.md +170 -3
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ imatrix-*.dat filter=lfs diff=lfs merge=lfs -text
37
+ *.gguf filter=lfs diff=lfs merge=lfs -text
38
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,170 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ quantized_by: ubergarm
3
+ pipeline_tag: text-generation
4
+ base_model: deepseek-ai/DeepSeek-V3.1
5
+ license: mit
6
+ base_model_relation: quantized
7
+ tags:
8
+ - mla
9
+ - imatrix
10
+ - deepseek_v3
11
+ - conversational
12
+ - ik_llama.cpp
13
+ ---
14
+
15
+ Working on this currently, be patient, I may change the exact quants released depending on how they are looking. Open a discussion if you have a specific target RAM+VRAM size quant in mind for your rig.
16
+
17
+ ## `ik_llama.cpp` imatrix Quantizations of deepseek-ai/DeepSeek-V3.1
18
+ This quant collection **REQUIRES** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
19
+
20
+ *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
21
+
22
+ Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP with Windows builds for CUDA 12.9. Also check for [Windows builds by Thireus here.](https://github.com/Thireus/ik_llama.cpp/releases) which have been CUDA 12.8.
23
+
24
+ These quants provide best in class perplexity for the given memory footprint.
25
+
26
+ ## Big Thanks
27
+ Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
28
+
29
+ Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://huggingface.co/BeaverAI) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models!
30
+
31
+ ## Quant Collection
32
+ Perplexity computed against *wiki.test.raw*.
33
+
34
+ ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity improving as BPW increases.")
35
+
36
+ These first two are just test quants for baseline perplexity comparison:
37
+ * `BF16` TODO
38
+ - Final estimate: PPL = TODO
39
+
40
+ * `Q8_0` TODO
41
+ - Final estimate: PPL = TODO
42
+
43
+ ## IQ5_K TODO
44
+ Final estimate: PPL = TODO
45
+
46
+ <details>
47
+
48
+ <summary>πŸ‘ˆ Secret Recipe</summary>
49
+
50
+ ```bash
51
+ echo TODO
52
+ ```
53
+
54
+ </details>
55
+
56
+ ## IQ5_KS TODO
57
+ Final estimate: PPL = TODO
58
+
59
+ <details>
60
+
61
+ <summary>πŸ‘ˆ Secret Recipe</summary>
62
+
63
+ ```bash
64
+ echo TODO
65
+ ```
66
+
67
+ </details>
68
+
69
+ ## IQ4_K TODO
70
+ Final estimate: PPL = TODO
71
+
72
+ <details>
73
+
74
+ <summary>πŸ‘ˆ Secret Recipe</summary>
75
+
76
+ ```bash
77
+ echo TODO
78
+ ```
79
+
80
+ </details>
81
+
82
+ ## IQ4_KSS TODO
83
+ Final estimate: PPL = TODO
84
+
85
+ <details>
86
+
87
+ <summary>πŸ‘ˆ Secret Recipe</summary>
88
+
89
+ ```bash
90
+ echo TODO
91
+ ```
92
+
93
+ </details>
94
+
95
+ ## IQ3_KS TODO
96
+ Final estimate: PPL = TODO
97
+
98
+ <details>
99
+
100
+ <summary>πŸ‘ˆ Secret Recipe</summary>
101
+
102
+ ```bash
103
+ echo TODO
104
+ ```
105
+
106
+ </details>
107
+
108
+
109
+ ## IQ2_KL TODO
110
+ Final estimate: PPL = TODO
111
+
112
+ <details>
113
+
114
+ <summary>πŸ‘ˆ Secret Recipe</summary>
115
+
116
+ ```bash
117
+ echo TODO
118
+ ```
119
+
120
+ </details>
121
+
122
+ ## IQ1_S TODO
123
+ Final estimate: PPL = TODO
124
+
125
+ <details>
126
+
127
+ <summary>πŸ‘ˆ Secret Recipe</summary>
128
+
129
+ ```bash
130
+ echo TODO
131
+ ```
132
+
133
+ </details>
134
+
135
+ ## IQ1_KT TODO
136
+ Final estimate: PPL = TODO
137
+
138
+ <details>
139
+
140
+ <summary>πŸ‘ˆ Secret Recipe</summary>
141
+
142
+ ```bash
143
+ echo TODO
144
+ ```
145
+
146
+ </details>
147
+
148
+
149
+ ## Quick Start
150
+ ```bash
151
+ # Clone and checkout
152
+ $ git clone https://github.com/ikawrakow/ik_llama.cpp
153
+ $ cd ik_llama.cpp
154
+
155
+ # Build for hybrid CPU+CUDA
156
+ $ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1
157
+ $ cmake --build build --config Release -j $(nproc)
158
+
159
+ # Run API server
160
+ $ echo TODO
161
+ ```
162
+
163
+ ## References
164
+ * [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
165
+ * [Getting Started Guide (already out of date lol)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
166
+ * [Quant Cookers Guide](https://github.com/ikawrakow/ik_llama.cpp/discussions/434)
167
+ * [Compiling triton-cpu](https://github.com/triton-lang/triton-cpu/issues/237#issuecomment-2878180022)
168
+ * [fp8 to bf16 safetensors casting without GPU](https://github.com/ggml-org/llama.cpp/issues/14762#issuecomment-3098571703)
169
+ * [avx512 avx_vnni Zen5 experimental optimizations](https://github.com/ikawrakow/ik_llama.cpp/pull/710)
170
+ * [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)