it fails to run
The Qwen3.5-122B-A10B-Text-qx86-hi-mlx model takes up 114GB of storage after downloading. While I can successfully deploy and load it locally, it fails to run properly. Once initialized, the memory becomes insufficient, and it displays the error: 'The model has crashed without additional information. (Exit code: null).' In my experience, a Mac with 128GB of RAM can only support deploying large models with a maximum footprint of about 107GB. Regardless, thank you very much for your hard work.
Yeah, I figured that much. It is too big, I had the same issues. Well, at least we tried.
I will leave it up for a bit--for now try the qx85, it is closest you could get, it has very good metrics, and the vibe is excellent
https://huggingface.co/nightmedia/Qwen3.5-122B-A10B-Text-qx85-mlx
I am tempted to try this model on my 128gb Mac, which I've setup as a dedicated LLM machine with 125gb vram.
The issue is that the current model to beat is the 2-bit quant of Qwen3.5-397B-A17B, which is incredibly good in my experience. But, it might be interesting to see how this model compares (high quant, lower-sized model versus lower quant, higher-sized model).
It won't run, it's too big, I have the same setup. The qx85 worked, barely
It won't run, it's too big, I have the same setup. The qx85 worked, barely
..Except, it DOES run. I get around 40 tokens / second inference (not sure about prompt processing) with 96k context. I'll likely lower that to 64k but we shall see. I haven't had a chance to check out quality much, but it seems good after a few initial tests.
To get it to run on your 128gb Mac, you need to be very aggressive with the VRAM setting. Did you raise your VRAM limit to 125gb?
hah, that is something I did not try. amazing :)