|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- infrastructure-as-code |
|
|
- terraform |
|
|
- kubernetes |
|
|
- docker |
|
|
- devops |
|
|
- iac |
|
|
- dapo |
|
|
- reinforcement-learning |
|
|
- fine-tuned |
|
|
base_model: srallabandi0225/inframind-0.5b-grpo |
|
|
datasets: |
|
|
- custom |
|
|
model-index: |
|
|
- name: inframind-dapo |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: IaC Generation |
|
|
dataset: |
|
|
name: InfraMind-Bench |
|
|
type: custom |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 96.4 |
|
|
name: DAPO Accuracy |
|
|
--- |
|
|
|
|
|
# InfraMind-DAPO: Infrastructure-as-Code Model with Direct Advantage Policy Optimization |
|
|
|
|
|
**InfraMind-DAPO** is a 0.5B parameter language model fine-tuned for Infrastructure-as-Code (IaC) generation using **DAPO (Direct Advantage Policy Optimization)** - an advanced reinforcement learning technique that builds upon GRPO. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
| Attribute | Value | |
|
|
|-----------|-------| |
|
|
| **Base Model** | [inframind-0.5b-grpo](https://huggingface.co/srallabandi0225/inframind-0.5b-grpo) | |
|
|
| **Original Base** | Qwen/Qwen2.5-0.5B-Instruct | |
|
|
| **Parameters** | 500M | |
|
|
| **Training Method** | DAPO (Direct Advantage Policy Optimization) | |
|
|
| **Domain** | Infrastructure-as-Code | |
|
|
| **License** | MIT | |
|
|
|
|
|
### Training Pipeline |
|
|
|
|
|
``` |
|
|
Qwen2.5-0.5B-Instruct → GRPO Training → inframind-grpo → DAPO Training → inframind-dapo |
|
|
(Stage 1) (Stage 2 - This Model) |
|
|
``` |
|
|
|
|
|
This model is the **second stage** of InfraMind training, starting from the GRPO-trained checkpoint and applying DAPO innovations for enhanced learning. |
|
|
|
|
|
## What is DAPO? |
|
|
|
|
|
**Direct Advantage Policy Optimization (DAPO)** is an advanced RL algorithm that improves upon GRPO with four key innovations: |
|
|
|
|
|
| Innovation | Description | Benefit | |
|
|
|------------|-------------|---------| |
|
|
| **Clip-Higher** | Asymmetric clipping (ε_low=0.2, ε_high=0.28) | Allows high-advantage tokens to be reinforced more strongly | |
|
|
| **Dynamic Sampling** | Skip batches with uniform rewards | Prevents entropy collapse, maintains exploration | |
|
|
| **Token-Level Loss** | Per-token policy gradient | Finer-grained credit assignment | |
|
|
| **Overlong Punishment** | Soft length penalty | Prevents verbose, repetitive outputs | |
|
|
|
|
|
### Why DAPO After GRPO? |
|
|
|
|
|
| Stage | Method | Purpose | |
|
|
|-------|--------|---------| |
|
|
| Stage 1 | GRPO | Establish IaC generation capability from base model | |
|
|
| Stage 2 | DAPO | Refine with advanced techniques for quality improvement | |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
| Model | Training Method | Accuracy | Pass Threshold | |
|
|
|-------|-----------------|----------|----------------| |
|
|
| **inframind-grpo** | GRPO | **97.3%** | 0.6 | |
|
|
| **inframind-dapo** | DAPO | **96.4%** | 0.6 | |
|
|
| Base (Qwen2.5-0.5B) | None | ~30% | 0.6 | |
|
|
|
|
|
Evaluated on **InfraMind-Bench** (110 held-out test samples) across: |
|
|
- Terraform (AWS, GCP, Azure) |
|
|
- Kubernetes (Deployments, Services, Ingress) |
|
|
- Docker (Dockerfile, docker-compose) |
|
|
- CI/CD (GitHub Actions, GitLab CI) |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
# Load DAPO model |
|
|
model = AutoModelForCausalLM.from_pretrained("srallabandi0225/inframind-0.5b-dapo") |
|
|
tokenizer = AutoTokenizer.from_pretrained("srallabandi0225/inframind-0.5b-dapo") |
|
|
|
|
|
# Generate Terraform |
|
|
prompt = """### Instruction: |
|
|
Create Terraform for AWS EC2 instance |
|
|
### Input: |
|
|
t3.micro instance type |
|
|
### Response: |
|
|
""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=512, |
|
|
temperature=0.7, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.pad_token_id |
|
|
) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Example Output |
|
|
|
|
|
```hcl |
|
|
resource "aws_instance" "web" { |
|
|
ami = "ami-0c55b159cbfafe1f0" |
|
|
instance_type = "t3.micro" |
|
|
|
|
|
tags = { |
|
|
Name = "web-server" |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Supported IaC Categories |
|
|
|
|
|
| Category | Examples | Coverage | |
|
|
|----------|----------|----------| |
|
|
| **Terraform** | EC2, S3, VPC, RDS, EKS, Lambda, IAM | AWS, GCP, Azure | |
|
|
| **Kubernetes** | Deployment, Service, Ingress, ConfigMap, RBAC | All K8s resources | |
|
|
| **Docker** | Dockerfile, docker-compose | Multi-stage builds | |
|
|
| **CI/CD** | GitHub Actions, GitLab CI, Jenkins | Workflows, pipelines | |
|
|
| **Ansible** | Playbooks, roles | Server configuration | |
|
|
| **Helm** | Charts, values.yaml | K8s package management | |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### DAPO Configuration |
|
|
|
|
|
```yaml |
|
|
Training: |
|
|
epochs: 2 |
|
|
batch_size: 16 (effective) |
|
|
learning_rate: 5e-6 |
|
|
beta (KL): 0.0 # Pure DAPO - no KL penalty |
|
|
generations_per_prompt: 8 |
|
|
|
|
|
DAPO Innovations: |
|
|
clip_higher: |
|
|
epsilon_low: 0.2 |
|
|
epsilon_high: 0.28 |
|
|
dynamic_sampling: true |
|
|
token_level_loss: true |
|
|
overlong_punishment: |
|
|
enabled: true |
|
|
soft_penalty: true |
|
|
|
|
|
LoRA: |
|
|
r: 16 |
|
|
alpha: 32 |
|
|
target_modules: [q_proj, k_proj, v_proj, o_proj] |
|
|
``` |
|
|
|
|
|
### Reward Function |
|
|
|
|
|
Domain-specific reward for IaC quality: |
|
|
|
|
|
``` |
|
|
Reward = α × Syntax + β × Correctness + γ × Format |
|
|
|
|
|
Where: |
|
|
- Syntax (α=0.4): Valid resource declarations |
|
|
- Correctness (β=0.3): Correct resource types |
|
|
- Format (γ=0.3): Proper structure |
|
|
``` |
|
|
|
|
|
## GRPO vs DAPO Comparison |
|
|
|
|
|
| Aspect | GRPO | DAPO | |
|
|
|--------|------|------| |
|
|
| KL Penalty | β=0.04 | β=0.0 (none) | |
|
|
| Clipping | Symmetric | Asymmetric (Clip-Higher) | |
|
|
| Loss Granularity | Sequence-level | Token-level | |
|
|
| Sampling | All batches | Dynamic (skip uniform) | |
|
|
| Length Control | None | Overlong punishment | |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
| Deployment | Memory | GPU | |
|
|
|------------|--------|-----| |
|
|
| Training | 16GB+ | A100/A10G | |
|
|
| Inference | 2GB | Optional | |
|
|
| Edge (Raspberry Pi 5) | 4GB | None | |
|
|
|
|
|
The 0.5B model is small enough to run on edge devices, making it suitable for: |
|
|
- Air-gapped environments |
|
|
- Local development |
|
|
- CI/CD pipelines |
|
|
- IoT/Edge infrastructure |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **IaC-specific**: Optimized for infrastructure tasks, not general conversation |
|
|
- **English only**: Training data is in English |
|
|
- **No execution**: Generates code, does not execute or validate against real infrastructure |
|
|
- **Version-sensitive**: Generated code may use older API versions |
|
|
- **Security**: Always review generated code for security best practices |
|
|
|
|
|
### Out-of-Scope Uses |
|
|
|
|
|
- Legal or medical advice |
|
|
- General-purpose chatbot |
|
|
- Executing infrastructure changes without human review |
|
|
- Production deployment without validation |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
- Generating Terraform configurations |
|
|
- Creating Kubernetes manifests |
|
|
- Writing Dockerfiles and docker-compose |
|
|
- Building CI/CD pipelines |
|
|
- Infrastructure automation scripting |
|
|
|
|
|
### Users |
|
|
- DevOps engineers |
|
|
- Platform engineers |
|
|
- SREs |
|
|
- Cloud architects |
|
|
- Infrastructure developers |
|
|
|
|
|
## Training Data |
|
|
|
|
|
**InfraMind-Bench**: 2000+ IaC tasks in Alpaca format |
|
|
|
|
|
| Category | Tasks | |
|
|
|----------|-------| |
|
|
| Terraform | 500+ | |
|
|
| Kubernetes | 400+ | |
|
|
| Docker | 300+ | |
|
|
| CI/CD | 300+ | |
|
|
| Ansible | 200+ | |
|
|
| Helm | 150+ | |
|
|
| Monitoring | 150+ | |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
- Model may generate insecure configurations if not prompted for security |
|
|
- Generated infrastructure code should always be reviewed before deployment |
|
|
- Model does not have access to real infrastructure or credentials |
|
|
- Users are responsible for validating generated code against their security policies |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{rallabandi2024inframind, |
|
|
title={InfraMind: Fine-tuning Small Language Models for Infrastructure-as-Code Generation with Reinforcement Learning}, |
|
|
author={Rallabandi, Sai Kiran}, |
|
|
year={2024}, |
|
|
publisher={HuggingFace}, |
|
|
url={https://huggingface.co/srallabandi0225/inframind-0.5b-dapo} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Links |
|
|
|
|
|
- **GitHub**: [github.com/saikiranrallabandi/inframind](https://github.com/saikiranrallabandi/inframind) |
|
|
- **GRPO Model**: [srallabandi0225/inframind-0.5b-grpo](https://huggingface.co/srallabandi0225/inframind-0.5b-grpo) |
|
|
- **DAPO Model**: [srallabandi0225/inframind-0.5b-dapo](https://huggingface.co/srallabandi0225/inframind-0.5b-dapo) |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- [Qwen Team](https://github.com/QwenLM/Qwen) for the base model |
|
|
- [DeepSeek](https://github.com/deepseek-ai) for GRPO |
|
|
- [NVIDIA NeMo](https://docs.nvidia.com/nemo) for DAPO reference |
|
|
- [TRL](https://github.com/huggingface/trl) for training infrastructure |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
**Author**: Sai Kiran Rallabandi |
|
|
**GitHub**: [@saikiranrallabandi](https://github.com/saikiranrallabandi) |
|
|
|