Model Card for lfm-2.5-1.2b-instruct_for_sys_review

This model is a fine-tuned classifier for systematic review screening. It predicts whether a paper should be included (1) or excluded (0) based on the study title and abstract.

Model Details

Model Description

This model adapts the LFM2.5-1.2B-Instruct base model for binary inclusion screening in systematic reviews. It is trained with supervised fine-tuning (SFT) using LoRA adapters and an instruction-following format that outputs a single label (0/1).

  • Developed by: Intelligent Agents Research Group (IARG-UF)
  • Funded by [optional]: Not disclosed
  • Shared by [optional]: IARG-UF
  • Model type: Causal language model, instruction-tuned binary classifier (LoRA)
  • Language(s) (NLP): English
  • License: Same as base model (see base model card)
  • Finetuned from model [optional]: unsloth/LFM2.5-1.2B-Instruct

Model Sources [optional]

Uses

Direct Use

  • Automating Phase I screening in systematic reviews by classifying abstracts as include (1) or exclude (0).

Downstream Use [optional]

  • As a screening assistant in a human-in-the-loop workflow to prioritize studies for manual review.
  • As a baseline for additional fine-tuning on domain-specific screening tasks.

Out-of-Scope Use

  • Medical, legal, or safety-critical decision-making without expert oversight.
  • Tasks requiring nuanced multi-label classification beyond include/exclude.
  • Non-English abstracts or domains not represented in the training data.

Bias, Risks, and Limitations

  • The training data reflects specific inclusion criteria and may not generalize to other review topics.
  • Class imbalance can bias predictions toward exclusion.
  • The model may output a label even when evidence is insufficient or ambiguous.

Recommendations

  • Use as a triage tool with human verification.
  • Calibrate thresholds or prompts to reduce false positives.
  • Re-train or adapt for new domains and criteria.

How to Get Started with the Model

Use the code below to get started with the model.

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="models/lfm-2.5-1.2b-instruct_for_sys_review",
        max_seq_length=4096,
)
FastLanguageModel.for_inference(model)

prompt = """Assess inclusion for the review.
Title: 'ChatGPT as a Software Development Bot: A Project-Based Study'.
Abstract: The study examines ChatGPT as a support tool for undergraduate
software development projects and evaluates learning outcomes.
Output 0 or 1."""

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt",
        tokenize=True,
        return_dict=True,
).to("cuda")

output = model.generate(**inputs, max_new_tokens=64, temperature=0.0)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Details

Training Data

  • Source: data/train_data_LFM.csv
  • Task: Binary inclusion/exclusion screening based on title + abstract prompts.

Training Procedure

  • Method: Supervised fine-tuning (SFT) with LoRA adapters using Unsloth.
  • Objective: Generate a single token label (0/1) after the instruction prompt.

Training Hyperparameters

  • Training regime: Mixed precision (implementation-dependent; see training script)

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • Source: data/test_data_LFM.csv

Metrics

  • Accuracy, precision, recall, and F1 (macro and weighted), plus class-wise metrics.

Results

From results/eval_results_lfm2.5_1.2b_instruct.json (test set size = 56):

  • Accuracy (weighted): 0.9449
  • F1 (weighted): 0.9468
  • F1 (macro): 0.9377
  • Class 0 (Exclude) F1: 0.9610
  • Class 1 (Include) F1: 0.9143

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: Not disclosed
  • Hours used: Not disclosed
  • Cloud Provider: Not disclosed
  • Compute Region: Not disclosed
  • Carbon Emitted: Not disclosed

Technical Specifications [optional]

Model Architecture and Objective

  • Base model: LFM2.5-1.2B-Instruct
  • Objective: Instruction-following generation of a binary label for screening tasks.

Compute Infrastructure

  • Frameworks: Unsloth, Transformers, TRL

Citation [optional]

If you use this model, please cite the repository:

BibTeX:

@misc{llm_systematic_review_2026,
    title        = {LLM Systematic Review Classifier},
    author       = {Intelligent Agents Research Group},
    year         = {2026},
    howpublished = {\url{https://github.com/Intelligent-Agents-Research-Group/llm_systematic_review}}
}

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Framework versions

  • PEFT 0.18.1
Downloads last month
4
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IARG-UF/lfm-2.5-1.2b-instruct_for_sys_review

Adapter
(1)
this model

Paper for IARG-UF/lfm-2.5-1.2b-instruct_for_sys_review