---
license: mit
language:
- tr
- en
library_name: gguf
tags:
- kubernetes
- devops
- quantized
- gguf
- gemma3
- llama-cpp
- ollama
base_model: aciklab/kubernetes-ai
model_type: gemma3
quantized_by: aciklab
---

# Kubernetes AI - GGUF Quantized Models

Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to GGUF format for efficient local inference.

## Model Description

This repository contains GGUF quantized versions of the Kubernetes AI model, optimized for running on consumer hardware without GPU requirements. The model consists of LoRA adapters fine-tuned on unsloth/gemma-3-12b-it-qat-bnb-4bit and converted to GGUF format for llama.cpp compatibility.

**Primary Purpose:** Answer Kubernetes-related questions in Turkish language on local machines.

## Available Models

| Model | Size | Download |
|-------|------|----------|
| **Unquantized** | 22.0 GB | [kubernetes-ai.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai.gguf) |
| **Q8_0** | 12.5 GB | [kubernetes-ai-Q8_0.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q8_0.gguf) |
| **Q5_K_M** | 8.45 GB | [kubernetes-ai-Q5_K_M.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q5_K_M.gguf) |
| **Q4_K_M** | 7.3 GB | [kubernetes-ai-Q4_K_M.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf) |
| **Q4_K_S** | 6.9 GB | [kubernetes-ai-Q4_K_S.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_S.gguf) |
| **Q3_K_M** | 6.0 GB | [kubernetes-ai-Q3_K_M.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q3_K_M.gguf) |
| **IQ3_M** | 5.6 GB | [kubernetes-ai-IQ3_M.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-IQ3_M.gguf) |


**Recommended:** Q4_K_M for best balance of quality and size, or IQ3_M for low-end systems.

## Quick Start

### Using Ollama (Recommended)

Ollama provides the easiest way to run GGUF models locally.

#### 1. Install Ollama

```bash
# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Windows - Download from https://ollama.com/download
```

#### 2. Download Model

```bash
# Download your preferred quantization
wget https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf
```

#### 3. Create Modelfile

```bash
cat > Modelfile << 'EOF'
FROM <path-to-model>/kubernetes-ai.gguf

TEMPLATE """{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>
"""

# Model Parametreleri
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER top_k 64
PARAMETER repeat_penalty 1.05
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"

SYSTEM """Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."""
EOF
```

#### 4. Create and Run Model

```bash
# Create model
ollama create kubernetes-ai -f Modelfile

# Run interactive chat
ollama run kubernetes-ai

# Example query
ollama run kubernetes-ai "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"
```

## Training Details

This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters:

- **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit
- **Training Method:** LoRA (Low-Rank Adaptation)
- **LoRA Rank:** 8
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
- **Training Time:** 28 hours on NVIDIA RTX 5070 12GB
- **Max Sequence Length:** 1024 tokens

### Training Dataset Summary

| Dataset Category | Count | Description |
|-----------------|-------|-------------|
| **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials |
| **Stack Overflow** | 52,000 | Kubernetes Q&A from community |
| **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content |
| **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators |
| **Total** | **~157,210** | Comprehensive Kubernetes knowledge base |

## Quantization Details

All models were quantized using llama.cpp with importance matrix optimization:

- **Source:** Merged LoRA adapters with base model
- **Quantization Tool:** llama.cpp (latest)
- **Method:** K-quant and IQ-quant mixtures
- **Optimization:** Importance matrix for better quality

### Quantization Quality

- **Q4_K_M:** Best balance - recommended for most users
- **Q4_K_S:** Slightly smaller with minimal quality loss
- **Q3_K_M:** Good for memory-constrained systems
- **IQ3_M:** Advanced 3-bit quantization for laptops
- **Unquantized:** Original F16/F32 precision

## Hardware Requirements

### Minimum
- **CPU:** 4+ cores
- **RAM:** 8GB (for IQ3_M/Q3_K_M quantizations)
- **Storage:** 6-8GB free space
- **GPU:** Not required (CPU inference)

### Recommended
- **CPU:** 8+ cores
- **RAM:** 16GB (for Q4_K_M/Q4_K_S quantizations)
- **Storage:** 10GB free space
- **GPU:** Optional (can accelerate inference)

## License

This model is released under the **MIT License**. Free to use in commercial and open-source projects.

## Contact

**Produced by:** HAVELSAN/Açıklab

For questions or feedback, please open an issue on the model repository.

---

**Note:** These are GGUF quantized versions ready for immediate use. No additional model loading or merging required.