|
|
--- |
|
|
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
|
library_name: peft |
|
|
datasets: |
|
|
- LawInformedAI/claudette_tos |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# TinyLlama-ToS-Finetuned |
|
|
|
|
|
A LoRA-finetuned version of **TinyLlama-1.1B-Chat-v1.0** for detecting **unfair / anomalous Terms of Service clauses**. The model classifies clauses as **Fair** or **Unfair** based on anomalous patterns in legal text. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
- **Developed by:** Noshitha Padma Pratyusha Juttu (UMass Amherst, MS CS 2024–25) |
|
|
- **Model type:** Causal LM + LoRA adapters for classification |
|
|
- **Base model:** [TinyLlama-1.1B-Chat v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) |
|
|
- **Total parameters (base + LoRA):** ~1.101B |
|
|
- **LoRA trainable parameters:** ~1.13M (≈0.1% of base model) |
|
|
- **Language(s):** English |
|
|
- **License:** Apache-2.0 (same as base model) |
|
|
|
|
|
This model was finetuned with LoRA adapters. During training, only ~1.13M parameters were updated, while the 1.1B base model parameters remained frozen. The final uploaded model contains both the base weights and the adapter weights. |
|
|
|
|
|
## 📚 Citation |
|
|
|
|
|
If you use this model in your research or work, please cite the following paper: |
|
|
|
|
|
> Juttu, Noshitha Padma Pratyusha. *Text to Trust: Evaluating Fine-Tuning and LoRA Trade-Offs in Language Models for Unfair Terms of Service Detection*. arXiv preprint arXiv:2510.22531, 2025. |
|
|
https://arxiv.org/abs/2510.22531 |
|
|
|
|
|
|
|
|
### Model Sources |
|
|
- **Repository:** [GitHub – UnfairTOSAgreementsDetection](https://github.com/Stimils02/UnfairTOSAgreementsDetection) |
|
|
|
|
|
--- |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
- Clause-level classification of Terms of Service agreements. |
|
|
- Detects if a clause is likely “Unfair” or “Fair”. |
|
|
|
|
|
### Downstream Use |
|
|
- Legal NLP research and experiments. |
|
|
- Integrating into compliance assistants for contract review. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
- Not a substitute for professional legal advice. |
|
|
- Not guaranteed to generalize beyond English contracts. |
|
|
|
|
|
--- |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
- Limited to Claudette ToS dataset → may not represent all legal documents. |
|
|
- May produce false positives/negatives, especially on borderline clauses. |
|
|
- Outputs can be sensitive to prompt phrasing. |
|
|
|
|
|
### Recommendations |
|
|
Use this model as **assistive tool**, not for automated legal decision-making. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
|
|
|
base = "TinyLlama/TinyLlama-1.1B-Chat-v1.0" |
|
|
adapter = "Noshitha98/TinyLlama-ToS-Finetuned" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(base) |
|
|
model = AutoModelForCausalLM.from_pretrained(base) |
|
|
model = PeftModel.from_pretrained(model, adapter) |
|
|
|
|
|
prompt = "<s>[CLAUSE]: You agree that we may suspend your account at any time. \n[Is this anomalous?]:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=5) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|