File size: 10,396 Bytes
73daf53
 
 
 
 
 
 
 
 
 
 
fae1792
 
 
bfebeee
8ed7511
 
 
bfebeee
 
2e88dd6
 
 
 
bfebeee
fb355db
7041fe5
fb355db
41e77fd
 
 
aaf7427
ee0af77
 
bc8cd4e
c56977a
87e15d1
c56977a
5b7ca4f
 
863fa8f
fb904a3
9f41a78
78649b0
 
 
 
bfebeee
f6a58d6
 
 
 
948fb1f
bfebeee
73daf53
 
a23b889
 
 
d0b2a90
a23b889
 
1d6f9ba
f9ed0b3
a23b889
 
 
 
 
 
 
 
b581775
a23b889
 
25df826
cd602e9
a23b889
 
25df826
aaf7427
a7e4e50
c2e655e
 
aaf7427
c2e655e
21ba0f7
b54c279
948fb1f
 
 
3097ee2
 
 
bfebeee
a406f16
 
 
 
 
 
 
8f65e1d
a406f16
 
 
 
fd4c4bc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
base_model:
- black-forest-labs/FLUX.1-dev
pipeline_tag: text-to-image
tags:
- gguf
- flux
- text-to-image
- imatrix
---

# Supported?

Expect broken or faulty items for the time being. Use at your own discretion.

- ComfyUI-GGUF: all? (CPU/CUDA)
  - Fast dequant: BF16, Q8_0, Q5_1, Q5_0, Q4_1, Q4_0, Q6_K, Q5_K, Q4_K, Q3_K, Q2_K
  - Slow dequant: others [via GGUF/NumPy](https://github.com/city96/ComfyUI-GGUF/blob/379175e7bf8b65019cdd11108bb882120a6f17df/dequant.py#L24-L28)
- Forge: TBC
- stable-diffusion.cpp: [llama.cpp Feature-matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
  - CPU: all
  - Cuda: all?
  - Vulkan: >= Q3_K_S, > IQ4_S; [PR IQ1_S, IQ1_M](https://github.com/ggerganov/llama.cpp/pull/11528) [PR IQ4_XS](https://github.com/ggerganov/llama.cpp/pull/11501)
  - other: ?

# Bravo
Combined imatrix multiple images 512x512 25 and 50 steps [city96/flux1-dev-Q8_0](https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q8_0.gguf) euler

Using [llama.cpp quantize cae9fb4](https://github.com/ggerganov/llama.cpp/commit/cae9fb4361138b937464524eed907328731b81f6) with modified [lcpp.patch](https://github.com/city96/ComfyUI-GGUF/blob/main/tools/lcpp.patch).

## Experimental from f16

| Filename | Quant type | File Size | Description | Example Image |
| -------- | ---------- | --------- | ----------- | ------------- |
| [flux1-dev-IQ1_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/flux1-dev-IQ1_S.gguf) | IQ1_S | 2.45GB | bad quality | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/images/output_test_comb_IQ1_S_512_25_woman.png) |
| [flux1-dev-IQ1_M.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/flux1-dev-IQ1_M.gguf) | IQ1_M | 2.72GB | bad quality | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/images/output_test_comb_IQ1_M_512_25_woman.png) |
| [flux1-dev-IQ2_XXS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/flux1-dev-IQ2_XXS.gguf) | IQ2_XXS | 3.19GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/images/output_test_comb_IQ2_XXS_512_25_woman.png) |
| [flux1-dev-IQ2_XS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/flux1-dev-IQ2_XS.gguf) | IQ2_XS | 3.56GB | TBC | - |
| [flux1-dev-IQ2_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/flux1-dev-IQ2_S.gguf) | IQ2_S | 3.56GB | TBC | - |
| [flux1-dev-IQ2_M.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/flux1-dev-IQ2_M.gguf) | IQ2_M | 3.93GB | TBC | - |
| [flux1-dev-IQ3_XXS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/flux1-dev-IQ3_XXS.gguf) | IQ3_XXS | 4.66GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/images/output_test_comb_IQ3_XXS_512_25_woman.png) |
| [flux1-dev-IQ3_XS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/flux1-dev-IQ3_XS.gguf) | IQ3_XS | 5.22GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-f16-combined/images/output_test_comb_IQ3_XS_512_25_woman.png) |

## Observations

- Bravo IQ1_S worse than Alpha?

# Alpha
Simple imatrix: 512x512 single image 8/20 steps [city96/flux1-dev-Q3_K_S](https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q3_K_S.gguf) euler 

data: `load_imatrix: loaded 314 importance matrix entries from imatrix.dat computed on 7 chunks`.

Using [llama.cpp quantize cae9fb4](https://github.com/ggerganov/llama.cpp/commit/cae9fb4361138b937464524eed907328731b81f6) with modified [lcpp.patch](https://github.com/city96/ComfyUI-GGUF/blob/main/tools/lcpp.patch).

## Experimental from q8

| Filename | Quant type | File Size | Description | Example Image |
| -------- | ---------- | --------- | ----------- | ------------- |
| [flux1-dev-IQ1_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ1_S.gguf) | IQ1_S | 2.45GB | obviously bad quality | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_IQ1_S_512_25_woman.png) |
| - | IQ1_M | - | broken | - |
| [flux1-dev-TQ1_0.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-TQ1_0.gguf) | TQ1_0| 2.63GB | TBC | - |
| [flux1-dev-TQ2_0.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-TQ2_0.gguf) | TQ2_0 | 3.19GB | TBC | - |
| [flux1-dev-IQ2_XXS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ2_XXS.gguf) | IQ2_XXS | 3.19GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_IQ2_XXS_512_25_woman.png) |
| [flux1-dev-IQ2_XS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ2_XS.gguf) | IQ2_XS | 3.56GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_IQ2_XS_512_25_woman.png) |
| [flux1-dev-IQ2_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ2_S.gguf) | IQ2_S | 3.56GB | TBC | - |
| [flux1-dev-IQ2_M.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ2_M.gguf) | IQ2_M | 3.93GB | TBC | - |
| [flux1-dev-Q2_K.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q2_K.gguf) | Q2_K | 4.02GB | TBC | - |
| [flux1-dev-Q2_K_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q2_K_S.gguf) | Q2_K_S | 4.02GB | TBC | - |
| [flux1-dev-IQ3_XXS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ3_XXS.gguf) | IQ3_XXS | 4.66GB | TBC | - |
| [flux1-dev-IQ3_XS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ3_XS.gguf) | IQ3_XS | 5.22GB | TBC | - |
| [flux1-dev-IQ3_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ3_S.gguf) | IQ3_S | 5.22GB | TBC | - |
| [flux1-dev-IQ3_M.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ3_M.gguf) | IQ3_M | 5.22GB | TBC | - |
| [flux1-dev-Q3_K_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q3_K_S.gguf) | Q3_K_S | 5.22GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_Q3_K_S_512_25_woman.png) |
| [flux1-dev-Q3_K_M.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q3_K_K.gguf) | Q3_K_M | 5.36GB | TBC | - |
| [flux1-dev-Q3_K_L.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q3_K_L.gguf) | Q3_K_L | 5.36GB | TBC | - |
| [flux1-dev-IQ4_XS.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ4_XS.gguf) | IQ4_XS | 6.42GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_IQ4_XS_512_25_woman.png) |
| [flux1-dev-IQ4_NL.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-IQ4_NL.gguf) | IQ4_NL | 6.79GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_IQ4_NL_512_25_woman.png) |
| [flux1-dev-Q4_0.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q4_0.gguf) | Q4_0 | 6.79GB | TBC | - |
| - | Q4_K | TBC | TBC | - |
| [flux1-dev-Q4_K_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q4_K_S.gguf) | Q4_K_S | 6.79GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_Q4_K_S_512_25_woman.png) |
| [flux1-dev-Q4_K_M.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q4_K_M.gguf) | Q4_K_M | 6.93GB | TBC | - |
| [flux1-dev-Q4_1.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q4_1.gguf) | Q4_1 | 7.53GB | TBC | - |
| [flux1-dev-Q5_K_S.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q5_K_S.gguf) | Q5_K_S | 8.27GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_Q5_K_S_512_25_woman.png) |
| [flux1-dev-Q5_K.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q5_K.gguf) | Q5_K | 8.41GB | TBC | - |
| - | Q5_K_M | TBC | TBC | - |
| [flux1-dev-Q6_K.gguf](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/flux1-dev-Q6_K.gguf) | Q6_K | 9.84GB | TBC | - |
| - | Q8_0 | 12.7GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_Q8_512_25_woman.png) |
| - | F16 | 23.8GB | TBC | [Example](https://huggingface.co/Eviation/flux-imatrix/blob/main/experimental-from-q8/images/output_test_F16_512_25_woman.png) |

## Observations

Sub-quants not diferentiated as expected: IQ2_XS == IQ2_S, IQ3_XS == IQ3_S == IQ3_M, Q3_K_M == Q3_K_L.
- Check if [lcpp_sd3.patch](https://github.com/city96/ComfyUI-GGUF/blob/main/tools/lcpp_sd3.patch) includes more specific quant level logic
- Extrapolate the existing level logic

| Quant type | High level quants | Middle level quants | Low level quant | Average |
| ---------- | ----- | ----- | --------- | ---- |
| IQ1_S | 5.5% 16bpw | - | 94.5% 1.5625bpw | 2.3556bpw |
| IQ2_XXS | 4.2% 16bpw | - | 95.8% 2.0625bpw | 2.6504bpw |
| IQ2_XS | 3.8% 16bpw | - | 96.2% 2.3125bpw | 2.8297bpw |
| IQ2_S | 3.8% 16bpw | - | 96.2% 2.3125bpw | 2.8298bpw |
| IQ2_M | 3.4% 16bpw | - | 96.6% 2.5625bpw | 3.0224bpw |
| Q2_K_S | 3.3% 16bpw | - | 96.7% 2.625bpw | 3.0723bpw |
| IQ3_XXS | 2.9% 16bpw | - | 97.1% 3.0625bpw | 3.4351bpw |
| IQ3_XS | 2.6% 16bpw | - | 97.4% 3.4375bpw | 3.7609bpw |
| IQ3_S | 2.6% 16bpw | - | 97.4% 3.4375bpw | 3.7609bpw |
| IQ3_M | 2.6% 16bpw | - | 97.4% 3.4375bpw | 3.7609bpw |