mo137
/

FLUX.1-dev_Q8-fp16-fp32-mix_8-to-32-bpw_gguf

image-generation

Model card Files Files and versions

mo137 commited on Aug 22, 2024

Commit

4f2dd61

·

verified ·

1 Parent(s): 1dcb431

Update README.md

Files changed (1) hide show

README.md +64 -5

README.md CHANGED Viewed

@@ -1,5 +1,64 @@
----
-license: other
-license_name: flux-1-dev-non-commercial-license
-license_link: LICENSE
----

+---
+base_model: black-forest-labs/FLUX.1-dev
+library_name: gguf
+license: other
+license_name: flux-1-dev-non-commercial-license
+license_link: LICENSE.md
+quantized_by: mo137
+tags:
+- text-to-image
+- image-generation
+- flux
+---
+Flux.1-dev in a few experimental custom formats, mixing tensors in **Q8_0**, **fp16**, and **fp32**.
+Converted from black-forest-labs' original bf16 weights.
+### Motivation
+Flux's weights were published in bf16.
+Conversion to fp16 is slightly lossy, but fp32 is lossless.
+I experimented with mixed tensor formats to see if it would improve quality.
+### Evaluation
+I tried comparing the outputs but I can't say with any certainty if these models are significantly better than pure Q8_0.
+You're probably better off using Q8_0, but I thought I'll share these – maybe someone will find them useful.
+Higher bits per weight (bpw) numbers result in slower computation:
+```
+ 20 s  Q8_0
+ 23 s  11.0bpw-txt16
+ 30 s  fp16
+ 37 s  16.4bpw-txt32
+310 s  fp32
+```
+In the txt16/32 files, I quantized only these layers to Q8_0, unless they were one-dimensional:
+```
+img_mlp.0
+img_mlp.2
+img_mod.lin
+linear1
+linear2
+modulation.lin
+```
+But left all these at fp16 or fp32, respectively:
+```
+txt_mlp.0
+txt_mlp.2
+txt_mod.lin
+```
+The resulting bpw number is just an approximation from file size.
+---
+This is a direct GGUF conversion of [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main)
+As this is a quantized model not a finetune, all the same restrictions/original license terms still apply.
+The model files can be used with the [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node.
+Place model files in `ComfyUI/models/unet` - see the GitHub readme for further install instructions.
+Please refer to [this chart](https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md#llama-3-8b-scoreboard) for a basic overview of quantization types.
+(Model card mostly copied from [city96/FLUX.1-dev-gguf](https://huggingface.co/city96/FLUX.1-dev-gguf) - which contains conventional and useful GGUF files.)