Model request

#1
by pathosethoslogos - opened

If there are other models you're interested in seeing quantized to NVFP4 for use on the DGX Spark, or other modern Blackwell (or newer) cards let me know

Sorry, unsure where I could let you know, so I'm posting here.

Would GPT OSS 120B NVFP4, upstage/Solar-Open-100B NVFP4, and IQuest Coder V1 40B Loop Thinking NVFP4 be possible?

This is a fine place to do it. I can give Solar Open 100B a shot and see if the tools work for it currently (it appears to be a brand new model so there could be issues). As for GPT OSS 120B I did attempt it a while back but I kept getting a resulting quant that was larger than the original (240GB). I can try again now and see if my updated quantization script or maybe newer versions of everything generate a better result.

Thanks!

GPT OSS 120B I did attempt it a while back

I see, I thought something would be up. I looked at shanjiaz/gpt-oss-120b-nvfp4-modelopt's config.json and the layer_types param made me assume they had to make a workaround.

As for GPT OSS 120B I did attempt it a while back but I kept getting a resulting quant that was larger than the original (240GB).

Official GPT-OSS-120B is already in NVFP4 MXFP4 so there isn't much point in quantizing it to NVFP4. MXFP4 is also hardware accelerated on Blackwell. Well on B200, on RTX Pro 6000 it still uses the Marlin kernel.

Sign up or log in to comment