--- license: mit language: - tr - en library_name: gguf tags: - kubernetes - devops - quantized - gguf - gemma3 - llama-cpp - ollama base_model: aciklab/kubernetes-ai model_type: gemma3 quantized_by: aciklab --- # Kubernetes AI - GGUF Quantized Models Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to GGUF format for efficient local inference. ## Model Description This repository contains GGUF quantized versions of the Kubernetes AI model, optimized for running on consumer hardware without GPU requirements. The model consists of LoRA adapters fine-tuned on unsloth/gemma-3-12b-it-qat-bnb-4bit and converted to GGUF format for llama.cpp compatibility. **Primary Purpose:** Answer Kubernetes-related questions in Turkish language on local machines. ## Available Models | Model | Size | Download | |-------|------|----------| | **Unquantized** | 22.0 GB | [kubernetes-ai.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai.gguf) | | **Q8_0** | 12.5 GB | [kubernetes-ai-Q8_0.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q8_0.gguf) | | **Q5_K_M** | 8.45 GB | [kubernetes-ai-Q5_K_M.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q5_K_M.gguf) | | **Q4_K_M** | 7.3 GB | [kubernetes-ai-Q4_K_M.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf) | | **Q4_K_S** | 6.9 GB | [kubernetes-ai-Q4_K_S.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_S.gguf) | | **Q3_K_M** | 6.0 GB | [kubernetes-ai-Q3_K_M.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q3_K_M.gguf) | | **IQ3_M** | 5.6 GB | [kubernetes-ai-IQ3_M.gguf](https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-IQ3_M.gguf) | **Recommended:** Q4_K_M for best balance of quality and size, or IQ3_M for low-end systems. ## Quick Start ### Using Ollama (Recommended) Ollama provides the easiest way to run GGUF models locally. #### 1. Install Ollama ```bash # Linux curl -fsSL https://ollama.com/install.sh | sh # macOS brew install ollama # Windows - Download from https://ollama.com/download ``` #### 2. Download Model ```bash # Download your preferred quantization wget https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf ``` #### 3. Create Modelfile ```bash cat > Modelfile << 'EOF' FROM /kubernetes-ai.gguf TEMPLATE """{{ if .System }}system {{ .System }} {{ end }}{{ if .Prompt }}user {{ .Prompt }} {{ end }}model {{ .Response }} """ # Model Parametreleri PARAMETER temperature 1.0 PARAMETER top_p 0.95 PARAMETER top_k 64 PARAMETER repeat_penalty 1.05 PARAMETER stop "" PARAMETER stop "" SYSTEM """Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun.""" EOF ``` #### 4. Create and Run Model ```bash # Create model ollama create kubernetes-ai -f Modelfile # Run interactive chat ollama run kubernetes-ai # Example query ollama run kubernetes-ai "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?" ``` ## Training Details This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters: - **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit - **Training Method:** LoRA (Low-Rank Adaptation) - **LoRA Rank:** 8 - **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets - **Training Time:** 28 hours on NVIDIA RTX 5070 12GB - **Max Sequence Length:** 1024 tokens ### Training Dataset Summary | Dataset Category | Count | Description | |-----------------|-------|-------------| | **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials | | **Stack Overflow** | 52,000 | Kubernetes Q&A from community | | **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content | | **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators | | **Total** | **~157,210** | Comprehensive Kubernetes knowledge base | ## Quantization Details All models were quantized using llama.cpp with importance matrix optimization: - **Source:** Merged LoRA adapters with base model - **Quantization Tool:** llama.cpp (latest) - **Method:** K-quant and IQ-quant mixtures - **Optimization:** Importance matrix for better quality ### Quantization Quality - **Q4_K_M:** Best balance - recommended for most users - **Q4_K_S:** Slightly smaller with minimal quality loss - **Q3_K_M:** Good for memory-constrained systems - **IQ3_M:** Advanced 3-bit quantization for laptops - **Unquantized:** Original F16/F32 precision ## Hardware Requirements ### Minimum - **CPU:** 4+ cores - **RAM:** 8GB (for IQ3_M/Q3_K_M quantizations) - **Storage:** 6-8GB free space - **GPU:** Not required (CPU inference) ### Recommended - **CPU:** 8+ cores - **RAM:** 16GB (for Q4_K_M/Q4_K_S quantizations) - **Storage:** 10GB free space - **GPU:** Optional (can accelerate inference) ## License This model is released under the **MIT License**. Free to use in commercial and open-source projects. ## Contact **Produced by:** HAVELSAN/Açıklab For questions or feedback, please open an issue on the model repository. --- **Note:** These are GGUF quantized versions ready for immediate use. No additional model loading or merging required.