Evrmind EVR-1 Bafethu-8b-Reasoning (DeepSeek R1 Distilled)
DeepSeek-R1-Distill-Llama-8B compressed using EVR-1 (Evrmind Reconstruction), a novel compression method developed independently by Evrmind. The compressed weights average approximately 3 bits per parameter; the total GGUF file (~3.9 GiB) includes additional metadata and structure overhead. A reasoning model that thinks step-by-step before answering.
EVR-1 Bafethu achieves 0.44% repetition at 500 tokens and 1.75% at 1000 tokens, maintaining coherent chain-of-thought reasoning while being nearly 4x smaller than F16.
~3.9 GiB | DeepSeek R1 Reasoning | Runs on laptops, desktops, and Android (Termux)
Note: HuggingFace may display an incorrect parameter count in the sidebar due to the custom compression format. EVR-1 is not a standard quantization (not Q2, Q3, Q4, etc).
Setup
You need two things: the model files (from this HuggingFace repo) and a platform binary (from GitHub).
Step 1: Clone this repo or download the files:
# Option A: Clone everything (3.93 GiB / ~4.2 GB download, requires git-lfs)
git lfs install
git clone https://huggingface.co/evrmind/evr-1-bafethu-8b-reasoning
cd evr-1-bafethu-8b-reasoning
# Option B: Or download individual files from the "Files" tab above
Step 2: Download the binary for your platform from the Downloads table. Save the archive into the evr-1-bafethu-8b-reasoning directory, then extract it:
# Linux + NVIDIA
mkdir -p linux-cuda && tar xzf evrmind-linux-cuda.tar.gz -C linux-cuda
# Linux + Vulkan
mkdir -p linux-vulkan && tar xzf evrmind-linux-vulkan.tar.gz -C linux-vulkan
# macOS (Apple Silicon)
mkdir -p metal && tar xzf evrmind-macos-metal.tar.gz -C metal
# Android (Termux)
mkdir -p android-vulkan && tar xzf evrmind-android-vulkan.tar.gz -C android-vulkan
For Windows, extract the .zip into a folder with the matching name (e.g., extract evrmind-windows-cuda.zip into a folder called windows-cuda).
After completing both steps, your directory should look like this:
evr-1-bafethu-8b-reasoning/
evr-deepseek-r1-llama-8b-reasoning.gguf <-- model weights
start-server.sh <-- Linux/macOS/Android launcher
start-server.bat <-- Windows launcher
webui/ <-- browser interface
linux-cuda/ <-- extracted platform binary (example)
llama-server
llama-cli
llama-completion
...
Web UI
Linux, macOS, Android (Termux):
./start-server.sh
# Open http://localhost:8080
Windows:
Double-click start-server.bat, or from Command Prompt:
start-server.bat
Then open http://localhost:8080 in your browser.
Network access (phone, tablet, other devices on the same WiFi):
./start-server.sh --network
The script will print the URL to open on other devices. The model runs on your computer; other devices just connect to the web UI. The --network and --cpu flags are only available in start-server.sh (Linux/macOS/Android).
See WEB_UI.md for more options and troubleshooting.
Quick Start (CLI)
These examples assume you have completed Setup and are in the repo directory.
Linux + NVIDIA GPU:
cd linux-cuda
LD_LIBRARY_PATH=. ./llama-cli -m ../evr-deepseek-r1-llama-8b-reasoning.gguf -ngl 99
macOS (Apple Silicon):
cd metal
./llama-cli -m ../evr-deepseek-r1-llama-8b-reasoning.gguf -ngl 99
Linux + Vulkan:
cd linux-vulkan
LD_LIBRARY_PATH=. ./llama-cli -m ../evr-deepseek-r1-llama-8b-reasoning.gguf -ngl 99
Android (Termux):
cd android-vulkan
LD_LIBRARY_PATH=. ./llama-cli -m ../evr-deepseek-r1-llama-8b-reasoning.gguf -ngl 99
Windows + NVIDIA (Command Prompt):
cd windows-cuda
llama-cli.exe -m ..\evr-deepseek-r1-llama-8b-reasoning.gguf -ngl 99
Windows + Vulkan (Command Prompt):
cd windows-vulkan
llama-cli.exe -m ..\evr-deepseek-r1-llama-8b-reasoning.gguf -ngl 99
CPU-only (no GPU):
Use -ngl 0 instead of -ngl 99 on any platform. Roughly 5-10x slower but works on any machine.
Downloads
| Platform | Download | GPU |
|---|---|---|
| Linux + NVIDIA | evrmind-linux-cuda.tar.gz | CUDA 12 |
| Linux + Any GPU | evrmind-linux-vulkan.tar.gz | Vulkan |
| Windows + NVIDIA | evrmind-windows-cuda.zip | CUDA 12 |
| Windows + Any GPU | evrmind-windows-vulkan.zip | Vulkan |
| macOS (Apple Silicon) | evrmind-macos-metal.tar.gz | Apple Silicon |
| Android (Termux) | evrmind-android-vulkan.tar.gz | Vulkan |
The model weights (evr-deepseek-r1-llama-8b-reasoning.gguf, 3.93 GiB (~4.2 GB download)) are available from the Files tab on this HuggingFace page. Platform binaries are hosted on GitHub Releases. You can verify downloads with SHA256SUMS.txt.
Note: The binaries are the same for the base, instruct, and reasoning models. You only need to download them once. Just point them at whichever GGUF you want to run.
How Reasoning Works
The model uses DeepSeek R1's reasoning format. It first thinks through the problem internally, then provides a clean answer:
<think>
To find 15% of 240, I need to multiply 240 by 0.15.
240 x 0.15 = 36
</think>
15% of 240 is **36**.
Why EVR-1 Bafethu-8b-Reasoning?
This is a reasoning model compressed to under 4 GiB using EVR-1 (Evrmind Reconstruction). The model thinks through problems step-by-step using <think>...</think> tags before providing a final answer. Useful for math, logic, coding, and complex questions.
EVR-1 compresses the model to under 4 GiB while maintaining coherent chain-of-thought reasoning at 1000+ tokens. In our tests (5 continuation-style prompts), EVR-1 Bafethu achieved 0.44% repetition at 500 tokens and 1.75% at 1000 tokens.
Benchmarks
Coherence (lower is better)
Average 4-gram repetition rate, 5 continuation-style prompts:
| Model | Size | rep4 @ 500 | rep4 @ 1000 |
|---|---|---|---|
| EVR-1 Bafethu | 3.93 GiB | 0.44% | 1.75% |
Perplexity
| Model | Size | Perplexity (wikitext-2, ctx=512) |
|---|---|---|
| DeepSeek-R1-Distill-Llama-8B Q4_K_M | 4.69 GiB | 14.39 |
| EVR-1 Bafethu | 3.93 GiB | 14.40 |
EVR-1 Bafethu matches Q4_K_M perplexity while being 16% smaller (3.93 GiB vs 4.69 GiB). DeepSeek-R1-Distill-Llama-8B has higher perplexity on raw text benchmarks than the base Llama 3.1 8B, as expected for a model distilled for reasoning tasks.
Coherence tested with 5 continuation-style prompts at 500 and 1000 tokens each, temperature 0, no repeat penalty. See BENCHMARK_RESULTS.md for full coherence results and sample outputs.
Limitations
- Context window has been tested up to 2048 tokens. Longer contexts may work but have not been validated at 3-bit compression.
- Occasional minor character-level artefacts due to 3-bit compression.
- Reasoning chains may occasionally be incomplete or circular.
- As with all heavily quantized models, generated text may contain factual inaccuracies (e.g., incorrect numbers, dates, or scientific details). Always verify factual claims independently.
System Requirements
- Storage: ~4 GiB for model weights + ~50 MB for binaries
- RAM: 6 GiB minimum (8 GiB recommended)
- GPU (recommended): NVIDIA (CUDA 12), Apple Silicon, or any Vulkan GPU
- CPU-only: Supported but slower (use
-ngl 0or--cpuflag) - OS: Linux, macOS (Apple Silicon), Windows, Android (Termux)
- Not supported: iOS, 32-bit systems
Safety and Responsible Use
This model can generate incorrect, biased, or harmful content. Reasoning chains may contain errors or circular logic. Users should apply appropriate content filtering for user-facing applications. See MODEL_CARD.md for details.
Derivative Works
If you create derivative works, credit "EVR-1 Bafethu" in your model name and documentation. Commercial use is permitted subject to the Llama 3.1 Community License Agreement and DeepSeek MIT License.
License
This model is subject to three licenses:
- Evrmind Free License 1.0: Covers the EVR-1 compression and distribution. Permits personal, research, and commercial use with attribution.
- DeepSeek MIT License: Covers the DeepSeek R1 distillation. Permissive open-source license.
- Llama 3.1 Community License: Covers the underlying Llama architecture. Permits commercial use for entities with fewer than 700 million monthly active users.
All three licenses apply. See LICENSE.md, DEEPSEEK_LICENSE.md, and META_LLAMA_LICENSE.md for full terms.
Also Available
- EVR-1 Maano-8b, base model for text completion
- EVR-1 Maano-8b-Instruct, instruction-following chat model
Contact
- Email: hello@evrmind.io
- Issues: GitHub
- Downloads last month
- 268
We're not able to determine the quantization variants.
Model tree for Evrmind/EVR-1-Bafethu-8b-Reasoning
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B