--- license: apache-2.0 pipeline_tag: text-generation library_name: node-llama-cpp tags: - node-llama-cpp - llama.cpp - conversational base_model: openai/gpt-oss-120b quantized_by: giladgd --- # gpt-oss-120b-GGUF > [!NOTE] > Read [our guide](https://node-llama-cpp.withcat.ai/blog/v3.12-gpt-oss) on using `gpt-oss` to learn how to adjust its responses

gpt-oss-120b

# Highlights * **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. * **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. * **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. * **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning. * **Agentic capabilities:** Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs. * **Native MXFP4 quantization:** The models are trained with native MXFP4 precision for the MoE layer, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. > [!NOTE] > Refer to the [original model card](https://huggingface.co/openai/gpt-oss-120b) for more details on the model # Quants | Link | [URI](https://node-llama-cpp.withcat.ai/cli/pull) | Size | |:-----|:--------------------------------------------------|-----:| | [GGUF](https://huggingface.co/giladgd/gpt-oss-120b-GGUF/resolve/main/gpt-oss-120b.MXFP4.gguf) | `hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf` | 63.4GB | | [GGUF](https://huggingface.co/giladgd/gpt-oss-120b-GGUF/resolve/main/gpt-oss-120b.F16.gguf) | `hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.F16-00001-of-00002.gguf` | 65.4GB | > [!TIP] > Download a quant using `node-llama-cpp` ([more info](https://node-llama-cpp.withcat.ai/cli/pull)): > ```bash > npx -y node-llama-cpp pull > ``` # Usage ## Use with [`node-llama-cpp`](https://node-llama-cpp.withcat.ai) (recommended) ### CLI Chat with the model: ```bash npx -y node-llama-cpp chat hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf ``` > [!NOTE] > Ensure that you have `node.js` installed first: > ```bash > brew install nodejs > ``` ### Code Use it in your node.js project: ```bash npm install node-llama-cpp ``` ```typescript import {getLlama, resolveModelFile, LlamaChatSession} from "node-llama-cpp"; const modelUri = "hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf"; const llama = await getLlama(); const model = await llama.loadModel({ modelPath: await resolveModelFile(modelUri) }); const context = await model.createContext(); const session = new LlamaChatSession({ contextSequence: context.getSequence() }); const q1 = "Hi there, how are you?"; console.log("User: " + q1); const a1 = await session.prompt(q1); console.log("AI: " + a1); ``` > [!TIP] > Read the [getting started guide](https://node-llama-cpp.withcat.ai/guide/) to quickly scaffold a new `node-llama-cpp` project #### Customize inference options Set [Harmoy](https://cookbook.openai.com/articles/openai-harmony) options using [`HarmonyChatWrapper`](https://node-llama-cpp.withcat.ai/api/classes/HarmonyChatWrapper): ```typescript import { getLlama, resolveModelFile, LlamaChatSession, HarmonyChatWrapper, defineChatSessionFunction } from "node-llama-cpp"; const modelUri = "hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf"; const llama = await getLlama(); const model = await llama.loadModel({ modelPath: await resolveModelFile(modelUri) }); const context = await model.createContext(); const session = new LlamaChatSession({ contextSequence: context.getSequence(), chatWrapper: new HarmonyChatWrapper({ modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.", reasoningEffort: "high" }) }); const functions = { getCurrentWeather: defineChatSessionFunction({ description: "Gets the current weather in the provided location.", params: { type: "object", properties: { location: { type: "string", description: "The city and state, e.g. San Francisco, CA" }, format: { enum: ["celsius", "fahrenheit"] } } }, handler({location, format}) { console.log(`Getting current weather for "${location}" in ${format}`); return { // simulate a weather API response temperature: format === "celsius" ? 20 : 68, format }; } }) }; const q1 = "What is the weather like in SF?"; console.log("User: " + q1); const a1 = await session.prompt(q1, {functions}); console.log("AI: " + a1); ``` ## Use with [llama.cpp](https://github.com/ggml-org/llama.cpp) Install llama.cpp through brew (works on Mac and Linux) ```bash brew install llama.cpp ``` ### CLI ```bash llama-cli --hf-repo giladgd/gpt-oss-120b-GGUF --hf-file gpt-oss-120b.MXFP4-00001-of-00002.gguf -p "The meaning to life and the universe is" ``` ### Server ```bash llama-server --hf-repo giladgd/gpt-oss-120b-GGUF --hf-file gpt-oss-120b.MXFP4-00001-of-00002.gguf -c 2048 ```