inferencerlabs
/

Kimi-K2-Instruct-MLX-3.985bit

Text Generation

4-bit precision

Model card Files Files and versions

inferencerlabs commited on Jul 27

Commit

5d491cd

·

verified ·

1 Parent(s): 7dcd912

Upload complete model

Files changed (1) hide show

README.md +20 -26

README.md CHANGED Viewed

@@ -7,30 +7,24 @@ pipeline_tag: text-generation
 tags:
 - mlx
 ---
-<div>
-  <p style="margin-bottom: 0; margin-top: 0;">
-    <strong>See Kimi-K2 Dynamic MLX in action - <a href="https://youtu.be/-zfUvA2CDqE">https://youtu.be/-zfUvA2CDqE</a></strong>
-  </p>
-<p style="margin-top: 0;margin-bottom: 0;">
-    <em>q3.95bit dynamic quant achieves 1.243 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).</em>
-    <div align="center">
-| | |
-|:---:|:---:|
-| **q2** | perplexity: 41.293 |
-| **q3** | perplexity: 1.900 |
-| **q3.95** | perplexity: 1.243 |
-| **q4** | perplexity: 1.168 |
-| **q6** | perplexity: 1.128 |
-| **q8** | perplexity: 1.128 |
-</div>
-  </p>
-</div>
-<h1 style="margin-top: 0rem;">Kimi K2 Usage Notes</h1>
-- Built with a modified version of MLX 0.26
-- Runs on a single M3 Ultra 512GB RAM
-- Requires expanding VRAM limit to at least ~500000 MB (I use 507000 for a larger context window)
-<em>sudo sysctl iogpu.wired_limit_mb=507000</em>
-- Expect ~20 tokens/s
-- For more details see <a href="https://youtu.be/-zfUvA2CDqE">demonstration video</> or visit <a href="https://moonshotai.github.io/Kimi-K2/">Kimi K2</a>.
----

 tags:
 - mlx
 ---
+**See Kimi-K2 Dynamic MLX in action - [https://youtu.be/-zfUvA2CDqE](https://youtu.be/-zfUvA2CDqE)**
+*q3.95bit dynamic quant achieves 1.243 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).*
+| Quantization | Perplexity |
+|:------------:|:----------:|
+| **q2**       | 41.293     |
+| **q3**       | 1.900      |
+| **q3.95**    | 1.243      |
+| **q4**       | 1.168      |
+| **q6**       | 1.128      |
+| **q8**       | 1.128      |
+## Kimi K2 Usage Notes
+* Built with a modified version of MLX 0.26
+* Runs on a single M3 Ultra 512GB RAM
+* Requires expanding VRAM limit to at least ~500000 MB (507000 used below for a larger context window)
+  * `sudo sysctl iogpu.wired_limit_mb=507000`
+* Expect ~20 tokens/s
+* For more details see [demonstration video](https://youtu.be/-zfUvA2CDqE) or visit [Kimi K2](https://moonshotai.github.io/Kimi-K2/).