ToMMeR-Llama-3.2-1B_L5_R64
ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.
Checkpoint Details
| Property | Value | 
|---|---|
| Base LLM | meta-llama/Llama-3.2-1B | 
| Layer | 5 | 
| #Params | 264.2K | 
Usage
Installation
Our code can be installed with pip+git, Please visit the repository for more details.
pip install git+https://github.com/VictorMorand/llm2ner.git
Fancy Outputs
import llm2ner
from llm2ner import ToMMeR
tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L5_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)
text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "
#fancy interactive output
outputs = llm2ner.plotting.demo_inference( text, tommer, llm,
    decoding_strategy="threshold",  # or "greedy" for flat segmentation
    threshold=0.5, # default 50%
    show_attn=True,
)
    Large
    
    
        PRED
    
    language
    
        PRED
    
    models
are awesome . While trained on 
    language
    
        PRED
    
    modeling
, they exhibit 
    emergent
    
        PRED
    
    abilities
that make them suitable for a wide range of 
    tasks
 
    
        PRED
    
, including 
    Named
    
        PRED
    
    Entity
    
    Recognition
( 
    NER
    
        PRED
    
) . 
Raw inference
By default, ToMMeR outputs span probabilities, but we also propose built-in options for decoding entities.
- Inputs:
- tokens (batch, seq): tokens to process,
 - model: LLM to extract representation from.
 
 - Outputs: (batch, seq, seq) matrix (masked outside valid spans)
 
tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L5_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)
#### Raw Inference
text = ["Large language models are awesome"]
print(f"Input text: {text[0]}")
#tokenize in shape (1, seq_len)
tokens = model.tokenizer(text, return_tensors="pt")["input_ids"].to(device)
# Output raw scores
output = tommer.forward(tokens, model) # (batch_size, seq_len, seq_len)
print(f"Raw Output shape: {output.shape}")
#use given decoding strategy to infer entities
entities = tommer.infer_entities(tokens=tokens, model=model, threshold=0.5, decoding_strategy="greedy")
str_entities = [ model.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
print(f"Predicted entities: {str_entities}")
>>> Input text: Large language models are awesome
>>> Raw Output shape: torch.Size([1, 6, 6])
>>> Predicted entities: ['Large language models']
Please visit the repository for more details and a demo notebook.
Evaluation Results
| dataset | precision | recall | f1 | n_samples | 
|---|---|---|---|---|
| MultiNERD | 0.2054 | 0.9877 | 0.3401 | 154144 | 
| CoNLL 2003 | 0.3036 | 0.9606 | 0.4614 | 16493 | 
| CrossNER_politics | 0.2841 | 0.9664 | 0.4391 | 1389 | 
| CrossNER_AI | 0.3308 | 0.973 | 0.4938 | 879 | 
| CrossNER_literature | 0.3621 | 0.9443 | 0.5235 | 916 | 
| CrossNER_science | 0.3636 | 0.9567 | 0.527 | 1193 | 
| CrossNER_music | 0.3859 | 0.9541 | 0.5495 | 945 | 
| ncbi | 0.1151 | 0.9369 | 0.2051 | 3952 | 
| FabNER | 0.2898 | 0.7363 | 0.416 | 13681 | 
| WikiNeural | 0.1953 | 0.9874 | 0.3261 | 92672 | 
| GENIA_NER | 0.2219 | 0.9667 | 0.361 | 16563 | 
| ACE 2005 | 0.2827 | 0.4551 | 0.3488 | 8230 | 
| Ontonotes | 0.2347 | 0.7526 | 0.3578 | 42193 | 
| Aggregated | 0.2177 | 0.9341 | 0.3531 | 353250 | 
| Mean | 0.275 | 0.8906 | 0.4115 | 353250 | 
Citation
If using this model or the approach, please cite the associated paper:
@misc{morand2025tommerefficiententity,
      title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models}, 
      author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski},
      year={2025},
      eprint={2510.19410},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.19410}, 
}
License
Apache-2.0 (see repository for full text).
Model tree for llm2ner/ToMMeR-Llama-3.2-1B_L5_R64
Base model
meta-llama/Llama-3.2-1B