---
license: apache-2.0
pipeline_tag: text-generation
library_name: node-llama-cpp
tags:
- node-llama-cpp
- llama.cpp
- conversational
base_model: openai/gpt-oss-120b
quantized_by: giladgd
---

# gpt-oss-120b-GGUF

> [!NOTE]
> Read [our guide](https://node-llama-cpp.withcat.ai/blog/v3.12-gpt-oss) on using `gpt-oss` to learn how to adjust its responses

<p align="center">
  <img alt="gpt-oss-120b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-120b.svg">
</p>

# Highlights

* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.  
* **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.  
* **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.  
* **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning.
* **Agentic capabilities:** Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs.
* **Native MXFP4 quantization:** The models are trained with native MXFP4 precision for the MoE layer, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory.

> [!NOTE]
> Refer to the [original model card](https://huggingface.co/openai/gpt-oss-120b) for more details on the model

# Quants
| Link | [URI](https://node-llama-cpp.withcat.ai/cli/pull) | Size |
|:-----|:--------------------------------------------------|-----:|
| [GGUF](https://huggingface.co/giladgd/gpt-oss-120b-GGUF/resolve/main/gpt-oss-120b.MXFP4.gguf) | `hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf` | 63.4GB |
| [GGUF](https://huggingface.co/giladgd/gpt-oss-120b-GGUF/resolve/main/gpt-oss-120b.F16.gguf) | `hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.F16-00001-of-00002.gguf` | 65.4GB |

> [!TIP]
> Download a quant using `node-llama-cpp` ([more info](https://node-llama-cpp.withcat.ai/cli/pull)):
> ```bash
> npx -y node-llama-cpp pull <URI>
> ```


# Usage
## Use with [`node-llama-cpp`](https://node-llama-cpp.withcat.ai) (recommended)
### CLI
Chat with the model:
```bash
npx -y node-llama-cpp chat hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf
```

> [!NOTE]
> Ensure that you have `node.js` installed first:
> ```bash
> brew install nodejs
> ```

### Code
Use it in your node.js project:
```bash
npm install node-llama-cpp
```

```typescript
import {getLlama, resolveModelFile, LlamaChatSession} from "node-llama-cpp";

const modelUri = "hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf";


const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
    contextSequence: context.getSequence()
});


const q1 = "Hi there, how are you?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);
```

> [!TIP]
> Read the [getting started guide](https://node-llama-cpp.withcat.ai/guide/) to quickly scaffold a new `node-llama-cpp` project

#### Customize inference options
Set [Harmoy](https://cookbook.openai.com/articles/openai-harmony) options using [`HarmonyChatWrapper`](https://node-llama-cpp.withcat.ai/api/classes/HarmonyChatWrapper):
```typescript
import {
    getLlama, resolveModelFile, LlamaChatSession, HarmonyChatWrapper,
    defineChatSessionFunction
} from "node-llama-cpp";

const modelUri = "hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf";


const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
    contextSequence: context.getSequence(),
    chatWrapper: new HarmonyChatWrapper({
        modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.",
        reasoningEffort: "high"
    })
});

const functions = {
    getCurrentWeather: defineChatSessionFunction({
        description: "Gets the current weather in the provided location.",
        params: {
            type: "object",
            properties: {
                location: {
                    type: "string",
                    description: "The city and state, e.g. San Francisco, CA"
                },
                format: {
                    enum: ["celsius", "fahrenheit"]
                }
            }
        },
        handler({location, format}) {
            console.log(`Getting current weather for "${location}" in ${format}`);

            return {
                // simulate a weather API response
                temperature: format === "celsius" ? 20 : 68,
                format
            };
        }
    })
};

const q1 = "What is the weather like in SF?";
console.log("User: " + q1);

const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);
```


## Use with [llama.cpp](https://github.com/ggml-org/llama.cpp)
Install llama.cpp through brew (works on Mac and Linux)

```bash
brew install llama.cpp
```

### CLI
```bash
llama-cli --hf-repo giladgd/gpt-oss-120b-GGUF --hf-file gpt-oss-120b.MXFP4-00001-of-00002.gguf -p "The meaning to life and the universe is"
```

### Server
```bash
llama-server --hf-repo giladgd/gpt-oss-120b-GGUF --hf-file gpt-oss-120b.MXFP4-00001-of-00002.gguf -c 2048
```