# Gemma-3-270M Threat Classifier

## Model Description
Fine-tuned version of `google/gemma-3-270m` for binary threat classification (Safe vs Unsafe prompts).

## Training Details
- **Base Model**: google/gemma-3-270m
- **Task**: Binary Text Classification
- **Training Date**: 2025-12-31
- **Training Framework**: Hugging Face Transformers

## Hyperparameters
- Learning Rate: 2e-05
- Batch Size: 16
- Epochs: 10
- Max Length: 512
- Optimizer: adamw_torch

## Performance (Test Set)
- Accuracy: 0.8363
- Precision: 0.8232
- Recall: 0.8882
- F1 Score: 0.8544
- AUC-ROC: 0.9101

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("path/to/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/model")

text = "Your prompt here"
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item()
label = "unsafe" if prediction == 1 else "safe"
```

## Labels
- 0: Safe
- 1: Unsafe (Threat/Jailbreak)