# Gemma-3-270M Threat Classifier ## Model Description Fine-tuned version of `google/gemma-3-270m` for binary threat classification (Safe vs Unsafe prompts). ## Training Details - **Base Model**: google/gemma-3-270m - **Task**: Binary Text Classification - **Training Date**: 2025-12-31 - **Training Framework**: Hugging Face Transformers ## Hyperparameters - Learning Rate: 2e-05 - Batch Size: 16 - Epochs: 10 - Max Length: 512 - Optimizer: adamw_torch ## Performance (Test Set) - Accuracy: 0.8363 - Precision: 0.8232 - Recall: 0.8882 - F1 Score: 0.8544 - AUC-ROC: 0.9101 ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("path/to/model") tokenizer = AutoTokenizer.from_pretrained("path/to/model") text = "Your prompt here" inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True) outputs = model(**inputs) prediction = outputs.logits.argmax(-1).item() label = "unsafe" if prediction == 1 else "safe" ``` ## Labels - 0: Safe - 1: Unsafe (Threat/Jailbreak)