---
license: mit
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- infrastructure-as-code
- terraform
- kubernetes
- docker
- devops
- iac
- dapo
- reinforcement-learning
- fine-tuned
base_model: srallabandi0225/inframind-0.5b-grpo
datasets:
- custom
model-index:
- name: inframind-dapo
  results:
  - task:
      type: text-generation
      name: IaC Generation
    dataset:
      name: InfraMind-Bench
      type: custom
    metrics:
    - type: accuracy
      value: 96.4
      name: DAPO Accuracy
---

# InfraMind-DAPO: Infrastructure-as-Code Model with Direct Advantage Policy Optimization

**InfraMind-DAPO** is a 0.5B parameter language model fine-tuned for Infrastructure-as-Code (IaC) generation using **DAPO (Direct Advantage Policy Optimization)** - an advanced reinforcement learning technique that builds upon GRPO.

## Model Description

| Attribute | Value |
|-----------|-------|
| **Base Model** | [inframind-0.5b-grpo](https://huggingface.co/srallabandi0225/inframind-0.5b-grpo) |
| **Original Base** | Qwen/Qwen2.5-0.5B-Instruct |
| **Parameters** | 500M |
| **Training Method** | DAPO (Direct Advantage Policy Optimization) |
| **Domain** | Infrastructure-as-Code |
| **License** | MIT |

### Training Pipeline

```
Qwen2.5-0.5B-Instruct → GRPO Training → inframind-grpo → DAPO Training → inframind-dapo
                        (Stage 1)                        (Stage 2 - This Model)
```

This model is the **second stage** of InfraMind training, starting from the GRPO-trained checkpoint and applying DAPO innovations for enhanced learning.

## What is DAPO?

**Direct Advantage Policy Optimization (DAPO)** is an advanced RL algorithm that improves upon GRPO with four key innovations:

| Innovation | Description | Benefit |
|------------|-------------|---------|
| **Clip-Higher** | Asymmetric clipping (ε_low=0.2, ε_high=0.28) | Allows high-advantage tokens to be reinforced more strongly |
| **Dynamic Sampling** | Skip batches with uniform rewards | Prevents entropy collapse, maintains exploration |
| **Token-Level Loss** | Per-token policy gradient | Finer-grained credit assignment |
| **Overlong Punishment** | Soft length penalty | Prevents verbose, repetitive outputs |

### Why DAPO After GRPO?

| Stage | Method | Purpose |
|-------|--------|---------|
| Stage 1 | GRPO | Establish IaC generation capability from base model |
| Stage 2 | DAPO | Refine with advanced techniques for quality improvement |

## Evaluation Results

| Model | Training Method | Accuracy | Pass Threshold |
|-------|-----------------|----------|----------------|
| **inframind-grpo** | GRPO | **97.3%** | 0.6 |
| **inframind-dapo** | DAPO | **96.4%** | 0.6 |
| Base (Qwen2.5-0.5B) | None | ~30% | 0.6 |

Evaluated on **InfraMind-Bench** (110 held-out test samples) across:
- Terraform (AWS, GCP, Azure)
- Kubernetes (Deployments, Services, Ingress)
- Docker (Dockerfile, docker-compose)
- CI/CD (GitHub Actions, GitLab CI)

## Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load DAPO model
model = AutoModelForCausalLM.from_pretrained("srallabandi0225/inframind-0.5b-dapo")
tokenizer = AutoTokenizer.from_pretrained("srallabandi0225/inframind-0.5b-dapo")

# Generate Terraform
prompt = """### Instruction:
Create Terraform for AWS EC2 instance
### Input:
t3.micro instance type
### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Example Output

```hcl
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name = "web-server"
  }
}
```

## Supported IaC Categories

| Category | Examples | Coverage |
|----------|----------|----------|
| **Terraform** | EC2, S3, VPC, RDS, EKS, Lambda, IAM | AWS, GCP, Azure |
| **Kubernetes** | Deployment, Service, Ingress, ConfigMap, RBAC | All K8s resources |
| **Docker** | Dockerfile, docker-compose | Multi-stage builds |
| **CI/CD** | GitHub Actions, GitLab CI, Jenkins | Workflows, pipelines |
| **Ansible** | Playbooks, roles | Server configuration |
| **Helm** | Charts, values.yaml | K8s package management |

## Training Details

### DAPO Configuration

```yaml
Training:
  epochs: 2
  batch_size: 16 (effective)
  learning_rate: 5e-6
  beta (KL): 0.0  # Pure DAPO - no KL penalty
  generations_per_prompt: 8

DAPO Innovations:
  clip_higher:
    epsilon_low: 0.2
    epsilon_high: 0.28
  dynamic_sampling: true
  token_level_loss: true
  overlong_punishment:
    enabled: true
    soft_penalty: true

LoRA:
  r: 16
  alpha: 32
  target_modules: [q_proj, k_proj, v_proj, o_proj]
```

### Reward Function

Domain-specific reward for IaC quality:

```
Reward = α × Syntax + β × Correctness + γ × Format

Where:
- Syntax (α=0.4): Valid resource declarations
- Correctness (β=0.3): Correct resource types
- Format (γ=0.3): Proper structure
```

## GRPO vs DAPO Comparison

| Aspect | GRPO | DAPO |
|--------|------|------|
| KL Penalty | β=0.04 | β=0.0 (none) |
| Clipping | Symmetric | Asymmetric (Clip-Higher) |
| Loss Granularity | Sequence-level | Token-level |
| Sampling | All batches | Dynamic (skip uniform) |
| Length Control | None | Overlong punishment |

## Hardware Requirements

| Deployment | Memory | GPU |
|------------|--------|-----|
| Training | 16GB+ | A100/A10G |
| Inference | 2GB | Optional |
| Edge (Raspberry Pi 5) | 4GB | None |

The 0.5B model is small enough to run on edge devices, making it suitable for:
- Air-gapped environments
- Local development
- CI/CD pipelines
- IoT/Edge infrastructure

## Limitations

- **IaC-specific**: Optimized for infrastructure tasks, not general conversation
- **English only**: Training data is in English
- **No execution**: Generates code, does not execute or validate against real infrastructure
- **Version-sensitive**: Generated code may use older API versions
- **Security**: Always review generated code for security best practices

### Out-of-Scope Uses

- Legal or medical advice
- General-purpose chatbot
- Executing infrastructure changes without human review
- Production deployment without validation

## Intended Use

### Primary Use Cases
- Generating Terraform configurations
- Creating Kubernetes manifests
- Writing Dockerfiles and docker-compose
- Building CI/CD pipelines
- Infrastructure automation scripting

### Users
- DevOps engineers
- Platform engineers
- SREs
- Cloud architects
- Infrastructure developers

## Training Data

**InfraMind-Bench**: 2000+ IaC tasks in Alpaca format

| Category | Tasks |
|----------|-------|
| Terraform | 500+ |
| Kubernetes | 400+ |
| Docker | 300+ |
| CI/CD | 300+ |
| Ansible | 200+ |
| Helm | 150+ |
| Monitoring | 150+ |

## Ethical Considerations

- Model may generate insecure configurations if not prompted for security
- Generated infrastructure code should always be reviewed before deployment
- Model does not have access to real infrastructure or credentials
- Users are responsible for validating generated code against their security policies

## Citation

```bibtex
@misc{rallabandi2024inframind,
  title={InfraMind: Fine-tuning Small Language Models for Infrastructure-as-Code Generation with Reinforcement Learning},
  author={Rallabandi, Sai Kiran},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/srallabandi0225/inframind-0.5b-dapo}
}
```

## Links

- **GitHub**: [github.com/saikiranrallabandi/inframind](https://github.com/saikiranrallabandi/inframind)
- **GRPO Model**: [srallabandi0225/inframind-0.5b-grpo](https://huggingface.co/srallabandi0225/inframind-0.5b-grpo)
- **DAPO Model**: [srallabandi0225/inframind-0.5b-dapo](https://huggingface.co/srallabandi0225/inframind-0.5b-dapo)

## Acknowledgments

- [Qwen Team](https://github.com/QwenLM/Qwen) for the base model
- [DeepSeek](https://github.com/deepseek-ai) for GRPO
- [NVIDIA NeMo](https://docs.nvidia.com/nemo) for DAPO reference
- [TRL](https://github.com/huggingface/trl) for training infrastructure

## Model Card Contact

**Author**: Sai Kiran Rallabandi
**GitHub**: [@saikiranrallabandi](https://github.com/saikiranrallabandi)