FP8/4bit version please

#7
by zhanghx0905 - opened

FP8/4bit version please

autoround gptq the better

are there any instructions on how to run this locally with 5X5090 ?

@mtcl I've got 7x 5090's, let me know if you figure out how to. Seems like 4-bit quantization should work for me, but still getting OOM's for some reason.

What command are you using to run it? What software are you using it?

@mtcl Trying to run through transformers. Tried loading in 4-bit with both the transformers load_in_4bit and using bitsandbytes. How about you?

Sign up or log in to comment