Go to llama.cpp and download one of those folders

If you're about to use CUDA - check the version your card supports(12.2 for any RTX) and download one of those folders

Unpack everything in one folder and rename it to "LlamaCPP", put this folder in the same folder where main.py/main.exe file is

Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
