File size: 3,583 Bytes
b544c5e
2bde2af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b544c5e
 
 
2bde2af
b544c5e
2bde2af
8825321
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
 
 
 
b544c5e
2bde2af
b544c5e
2bde2af
 
b544c5e
2bde2af
b544c5e
2bde2af
 
 
 
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
 
 
 
 
 
 
 
b544c5e
 
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
 
 
b544c5e
2bde2af
b544c5e
2bde2af
 
b544c5e
2bde2af
 
 
 
b544c5e
2bde2af
 
 
 
 
 
 
 
 
 
 
b544c5e
2bde2af
b544c5e
2bde2af
b544c5e
2bde2af
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
license: mit
language:
- en
base_model:
- jhu-clsp/ettin-encoder-32m
pipeline_tag: token-classification
tags:
- token classification
- hallucination detection
- retrieval-augmented generation
- transformers
- ettin
- lightweight
datasets:
- enelpol/rag-mini-bioasq
library_name: transformers
---

# TinyLettuce (Ettin-32M): Efficient Hallucination Detection

<p align="center">
  <img src="https://github.com/KRLabsOrg/LettuceDetect/blob/dev/assets/tinytinylettuce.png?raw=true" alt="TinyLettuce" width="400"/>
</p>

**Model Name:** tinylettuce-ettin-32m-en-v1

**Organization:** KRLabsOrg  

**Github:** https://github.com/KRLabsOrg/LettuceDetect

**Ettin encoders:** https://arxiv.org/pdf/2507.11412

## Overview

TinyLettuce is a token‑classification model that flags unsupported spans in answers given context. The 32M Ettin variant balances accuracy and CPU‑side efficiency; it’s designed for low‑cost domain fine‑tuning on synthetic data.

Trained on our synthetic dataset (mixed with RAGTruth), this 32M variant achieves 88.76% F1 on the held‑out synthetic test set (beating large-scale LLM judges like GPT-OSS-120b), proving the effectiveness of our domain‑specific hallucination data generation pipeline.

## Model Details

- Architecture: Ettin encoder (32M) + token‑classification head
- Task: token classification (0 = supported, 1 = hallucinated)
- Input: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens
- Language: English; License: MIT

## Training Data

- Synthetic (train): ~1,500 hallucinated samples (≈3,000 with non‑hallucinated) from enelpol/rag-mini-bioasq; intensity 0.3.
- Synthetic (test): 300 hallucinated samples (≈600 total) held out.

## Training Procedure

- Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100
- Max length: 4096; batch size: 8; epochs: 3
- Optimizer: AdamW (lr 1e‑5, weight_decay 0.01)
- Hardware: Single A100 80GB

## Results

Synthetic (domain‑specific):

| Model | Parameters | Precision (%) | Recall (%) | F1 (%) | Hardware |
|-------|------------|---------------|------------|--------|----------|
| TinyLettuce-17M | 17M | 84.56 | 98.21 | 90.87 | CPU |
| **TinyLettuce-32M** | 32M | 80.36 | 99.10 | 88.76 | CPU |
| TinyLettuce-68M | 68M | 89.54 | 95.96 | 92.64 | CPU |
| GPT-5-mini | ~200B | 71.95 | 100.00 | 83.69 | API/GPU |
| GPT-OSS-120B | 120B | 72.21 | 98.64 | 83.38 | GPU |
| Qwen3-235B | 235B | 66.74 | 99.32 | 79.84 | GPU |


## Usage

First install lettucedetect:

```bash
pip install lettucedetect
```

Then use it:

```python
from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
    method="transformer",
    model_path="KRLabsOrg/tinylettuce-ettin-32m-en-v1",
)

spans = detector.predict(
    context=[
        "Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily."
    ],
    question="What is the maximum daily dose of ibuprofen?",
    answer="The maximum daily dose of ibuprofen for adults is 3200mg.",
    output_format="spans",
)
print(spans)
# Output: [{"start": 51, "end": 57, "text": "3200mg"}]
```

## Citing

If you use the model or the tool, please cite the following paper:

```bibtex
@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}
```