Noshitha98's picture
Update README.md
8539588 verified
metadata
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
library_name: peft
datasets:
  - LawInformedAI/claudette_tos
metrics:
  - accuracy
  - precision
  - recall
  - f1
pipeline_tag: text-classification

TinyLlama-ToS-Finetuned

A LoRA-finetuned version of TinyLlama-1.1B-Chat-v1.0 for detecting unfair / anomalous Terms of Service clauses. The model classifies clauses as Fair or Unfair based on anomalous patterns in legal text.


Model Details

Model Description

  • Developed by: Noshitha Padma Pratyusha Juttu (UMass Amherst, MS CS 2024โ€“25)
  • Model type: Causal LM + LoRA adapters for classification
  • Base model: TinyLlama-1.1B-Chat v1.0
  • Total parameters (base + LoRA): ~1.101B
  • LoRA trainable parameters: ~1.13M (โ‰ˆ0.1% of base model)
  • Language(s): English
  • License: Apache-2.0 (same as base model)

This model was finetuned with LoRA adapters. During training, only ~1.13M parameters were updated, while the 1.1B base model parameters remained frozen. The final uploaded model contains both the base weights and the adapter weights.

๐Ÿ“š Citation

If you use this model in your research or work, please cite the following paper:

Juttu, Noshitha Padma Pratyusha. Text to Trust: Evaluating Fine-Tuning and LoRA Trade-Offs in Language Models for Unfair Terms of Service Detection. arXiv preprint arXiv:2510.22531, 2025.
https://arxiv.org/abs/2510.22531

Model Sources


Uses

Direct Use

  • Clause-level classification of Terms of Service agreements.
  • Detects if a clause is likely โ€œUnfairโ€ or โ€œFairโ€.

Downstream Use

  • Legal NLP research and experiments.
  • Integrating into compliance assistants for contract review.

Out-of-Scope Use

  • Not a substitute for professional legal advice.
  • Not guaranteed to generalize beyond English contracts.

Bias, Risks, and Limitations

  • Limited to Claudette ToS dataset โ†’ may not represent all legal documents.
  • May produce false positives/negatives, especially on borderline clauses.
  • Outputs can be sensitive to prompt phrasing.

Recommendations

Use this model as assistive tool, not for automated legal decision-making.


How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter = "Noshitha98/TinyLlama-ToS-Finetuned"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, adapter)

prompt = "<s>[CLAUSE]: You agree that we may suspend your account at any time. \n[Is this anomalous?]:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))