THUNDER-AI-GGUF

THUNDER-AI-GGUF is a GGUF release of the THUNDER AI model for local inference.

Available model file

Run the raw model directly from Hugging Face:

ollama run hf.co/EREN121232/THUNDER-AI-GGUF:Q4_K_M

Modelfile.thunder-clean
- Builds a cleaned Ollama wrapper model that avoids leaking <think>...</think> tags.
- Also sets num_ctx 8192 for a larger working context.
ollama_memory_proxy.py
- Optional local proxy for Ollama-compatible clients.
- Adds lightweight conversation memory by saving useful facts/preferences from user messages and injecting relevant memories into future prompts.

ollama create thunder-ai-clean -f Modelfile.thunder-clean

The memory proxy is meant for local setups where an app talks to Ollama through an HTTP endpoint.

Set these environment variables if you want to customize it:

Then run:

python ollama_memory_proxy.py

By default it listens on 127.0.0.1:11435 and forwards requests to Ollama on 127.0.0.1:11434.

This repo is for local GGUF usage.
Machine-specific launcher scripts were intentionally not included in the repo because they depend on local Windows paths and drive layout.
The model was fine-tuned and exported with Unsloth.

GGUF

Model size

2B params

Architecture

qwen2

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support