Model request

by pathosethoslogos - opened 8 days ago

8 days ago

•

If there are other models you're interested in seeing quantized to NVFP4 for use on the DGX Spark, or other modern Blackwell (or newer) cards let me know

Sorry, unsure where I could let you know, so I'm posting here.

Would GPT OSS 120B NVFP4, upstage/Solar-Open-100B NVFP4, and IQuest Coder V1 40B Loop Thinking NVFP4 be possible?

Firworks

Owner 8 days ago

This is a fine place to do it. I can give Solar Open 100B a shot and see if the tools work for it currently (it appears to be a brand new model so there could be issues). As for GPT OSS 120B I did attempt it a while back but I kept getting a resulting quant that was larger than the original (240GB). I can try again now and see if my updated quantization script or maybe newer versions of everything generate a better result.

pathosethoslogos

8 days ago

Thanks!

GPT OSS 120B I did attempt it a while back

I see, I thought something would be up. I looked at shanjiaz/gpt-oss-120b-nvfp4-modelopt's config.json and the layer_types param made me assume they had to make a workaround.

mratsim

8 days ago

•

edited 7 days ago

As for GPT OSS 120B I did attempt it a while back but I kept getting a resulting quant that was larger than the original (240GB).

Official GPT-OSS-120B is already in ~~NVFP4~~ MXFP4 so there isn't much point in quantizing it to NVFP4. MXFP4 is also hardware accelerated on Blackwell. Well on B200, on RTX Pro 6000 it still uses the Marlin kernel.

Firworks

Owner 7 days ago

Solar Open 100B:
https://huggingface.co/Firworks/Solar-Open-100B-nvfp4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment