bert-mini-squadv2

This model is a fine-tuned version of microsoft/MiniLM-L12-H384-uncased on hf-tuner/squad_v2.0.1 dataset.

It achieves the following results on the evaluation set:

  • Loss: 1.4653
  • Exact Match Accuracy: 62.95%

Evaluation Notes

Issues with Exact Match Evaluation

Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:

  • Predicted: schrodinger equation → Rejected (expected: schrödinger equation)
  • Predicted: feynman diagrams → Rejected (expected: feynman)
  • Predicted: electromagnetic force → Rejected (expected: electromagnetic)
  • Predicted: 45 000 pounds → Rejected (expected: 45000 pounds)

Overall Performance

  • Exact-match accuracy: >63%
  • The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
  • Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.

Recommendations for Best Results

  • Use clear, straightforward phrasing in queries to maximize extraction accuracy.

Model description

MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base

Direct Use

  • Extractive Question Answering: Given a passage and a question, the model extracts the most likely span of text that answers the question.
  • Handles unanswerable questions by predicting "no answer" when appropriate.

Downstream Use

Can be integrated into chatbots, virtual assistants, or search systems that require question answering over text.

Out-of-Scope Use

  • Generative question answering (the model cannot generate new answers).
  • Non-English tasks (the model was trained only on English data).
  • Open-Domain QA across large corpora — works best when the context passage is provided.

How to use

import torch
from transformers import BertForQuestionAnswering, AutoTokenizer

model_id='hf-tuner/bert-mini-squadv2'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = AutoTokenizer.from_pretrained(model_id)
bert_qa = BertForQuestionAnswering.from_pretrained(model_id).to(device)
bert_qa = bert_qa.half()

def get_answers(ctxq):
  inputs = tokenizer(ctxq, padding=True, return_tensors='pt')
  for k,v in inputs.items():
    inputs[k] = v.to(device)

  with torch.no_grad():
    outputs = bert_qa(**inputs)

  start_idxs = outputs.start_logits.argmax(dim=-1)
  end_idxs = outputs.end_logits.argmax(dim=-1)

  predictions = []
  for i, (start_idx, end_idx) in enumerate(zip(start_idxs, end_idxs)):
    if start_idx == end_idx:
      predictions.append("<no_answer>")
    else:
      predict_answer_tokens = inputs['input_ids'][i, start_idx : end_idx]
      pred_answer = tokenizer.decode(predict_answer_tokens)
      predictions.append(pred_answer)
  return predictions


context = """In Q3 2024, xAI raised $6 billion in a Series C round led by Valor Equity Partners and Andreessen Horowitz, with participation from Sequoia Capital, Fidelity, and Saudi Arabia’s Kingdom Holding Company, bringing its post-money valuation to $50 billion.
"""
question_1 = "Which two investors co-led xAI’s $6 billion Series C round announced in Q3 2024?"
question_2 = "On what exact date in Q3 2024 was xAI’s $6 billion Series C funding round officially closed?"

get_answers([
    [context, question_1],
    [context, question_2],
])

>>> ['valor equity partners and andreessen horowitz', '<no_answer>']

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 2
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.3678 1.0 8134 1.4974
1.1809 2.0 16268 1.4653

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
72
Safetensors
Model size
33.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hf-tuner/bert-mini-squadv2

Finetuned
(111)
this model

Dataset used to train hf-tuner/bert-mini-squadv2

Collection including hf-tuner/bert-mini-squadv2