THUNDER-AI-GGUF

THUNDER-AI-GGUF is a GGUF release of the THUNDER AI model for local inference.

Available model file

  • THUNDER-AI-R1 V1.2 1.5B.Q4_K_M.gguf

Ollama usage

Run the raw model directly from Hugging Face:

ollama run hf.co/EREN121232/THUNDER-AI-GGUF:Q4_K_M

Included helper files

  • Modelfile.thunder-clean
    • Builds a cleaned Ollama wrapper model that avoids leaking <think>...</think> tags.
    • Also sets num_ctx 8192 for a larger working context.
  • ollama_memory_proxy.py
    • Optional local proxy for Ollama-compatible clients.
    • Adds lightweight conversation memory by saving useful facts/preferences from user messages and injecting relevant memories into future prompts.

Build the cleaned Ollama model

ollama create thunder-ai-clean -f Modelfile.thunder-clean

Optional memory proxy usage

The memory proxy is meant for local setups where an app talks to Ollama through an HTTP endpoint.

Set these environment variables if you want to customize it:

  • THUNDER_REAL_OLLAMA_BASE_URL
  • THUNDER_PROXY_HOST
  • THUNDER_PROXY_PORT
  • THUNDER_MEMORY_FILE
  • THUNDER_MEMORY_MAX
  • THUNDER_MEMORY_INJECT_MAX

Then run:

python ollama_memory_proxy.py

By default it listens on 127.0.0.1:11435 and forwards requests to Ollama on 127.0.0.1:11434.

Notes

  • This repo is for local GGUF usage.
  • Machine-specific launcher scripts were intentionally not included in the repo because they depend on local Windows paths and drive layout.
  • The model was fine-tuned and exported with Unsloth.
Downloads last month
304
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support