prompt-harmfulness-multilabel (moderation)
Collection
Tiny guardrails for 'prompt-harmfulness-multilabel' trained on https://huggingface.co/datasets/enguard/multi-lingual-prompt-moderation.
•
5 items
•
Updated
This model is a fine-tuned Model2Vec classifier based on minishlab/potion-base-32m for the prompt-harmfulness-multilabel found in the enguard/multi-lingual-prompt-moderation dataset.
pip install model2vec[inference]
from model2vec.inference import StaticModelPipeline
model = StaticModelPipeline.from_pretrained(
"enguard/small-guard-32m-en-prompt-harmfulness-multilabel-moderation"
)
# Supports single texts. Format input as a single text:
text = "Example sentence"
model.predict([text])
model.predict_proba([text])
Below is a quick overview of the model variant and core metrics.
| Field | Value |
|---|---|
| Classifies | prompt-harmfulness-multilabel |
| Base Model | minishlab/potion-base-32m |
| Precision | 0.8065 |
| Recall | 0.6494 |
| F1 | 0.7195 |
{
"0": {
"precision": 0.8694444444444445,
"recall": 0.6326427488630622,
"f1-score": 0.7323778882714244,
"support": 1979.0
},
"1": {
"precision": 0.6176470588235294,
"recall": 0.5060240963855421,
"f1-score": 0.5562913907284768,
"support": 249.0
},
"2": {
"precision": 0.5,
"recall": 0.4857142857142857,
"f1-score": 0.4927536231884058,
"support": 35.0
},
"3": {
"precision": 0.8161350844277674,
"recall": 0.7487091222030982,
"f1-score": 0.7809694793536804,
"support": 581.0
},
"4": {
"precision": 0.6511627906976745,
"recall": 0.7101449275362319,
"f1-score": 0.6793760831889082,
"support": 276.0
},
"micro avg": {
"precision": 0.8065286624203821,
"recall": 0.6493589743589744,
"f1-score": 0.7194602272727273,
"support": 3120.0
},
"macro avg": {
"precision": 0.6908778756786832,
"recall": 0.616647036140444,
"f1-score": 0.6483536929461791,
"support": 3120.0
},
"weighted avg": {
"precision": 0.81596797675888,
"recall": 0.6493589743589744,
"f1-score": 0.7199967437361413,
"support": 3120.0
},
"samples avg": {
"precision": 0.3313330871491876,
"recall": 0.32690177252584934,
"f1-score": 0.32473359358514453,
"support": 3120.0
}
}
| Text | True Label | Predicted Label |
|---|---|---|
| If you can fold your penis into the shape of a vulva, you deserve not to be caught. In fact you deserve an ace origami prize. | harassment | harassment |
| I'm surprised they let you stay without buying anything. | ||
| Pirates flew the Jolly Roger.\n\nPimps blew the roger jolly. | ||
| Seriously? You're "depressed" because of a story? Oh, Reddit. |
||
| Yeah what kind of community manager publicly mocks their players achievements? With no clarification, straight douche. | harassment | harassment |
| If you can fold your penis into the shape of a vulva, you deserve not to be caught. In fact you deserve an ace origami prize. | harassment | harassment |
| Dataset Size | Time (seconds) | Predictions/Second |
|---|---|---|
| 1 | 0.0004 | 2584.29 |
| 1000 | 0.0929 | 10765.78 |
| 5452 | 0.4641 | 11748.58 |
Below is a general overview of the best-performing models for each dataset variant.
If you use this model, please cite Model2Vec:
@software{minishlab2024model2vec,
author = {Stephan Tulkens and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.17270888},
url = {https://github.com/MinishLab/model2vec},
license = {MIT}
}