bert-mini-squadv2

This model is a fine-tuned version of microsoft/MiniLM-L12-H384-uncased on hf-tuner/squad_v2.0.1 dataset.

It achieves the following results on the evaluation set:

Loss: 1.4653
Exact Match Accuracy: 62.95%

Evaluation Notes

Issues with Exact Match Evaluation

Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:

Predicted: schrodinger equation → Rejected (expected: schrödinger equation)
Predicted: feynman diagrams → Rejected (expected: feynman)
Predicted: electromagnetic force → Rejected (expected: electromagnetic)
Predicted: 45 000 pounds → Rejected (expected: 45000 pounds)

Overall Performance

Exact-match accuracy: >63%
The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.

Recommendations for Best Results

Use clear, straightforward phrasing in queries to maximize extraction accuracy.

Model description

MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base

Direct Use

Extractive Question Answering: Given a passage and a question, the model extracts the most likely span of text that answers the question.
Handles unanswerable questions by predicting "no answer" when appropriate.

Downstream Use

Can be integrated into chatbots, virtual assistants, or search systems that require question answering over text.

Out-of-Scope Use

Generative question answering (the model cannot generate new answers).
Non-English tasks (the model was trained only on English data).
Open-Domain QA across large corpora — works best when the context passage is provided.

How to use

import torch
from transformers import BertForQuestionAnswering, AutoTokenizer

model_id='hf-tuner/bert-mini-squadv2'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = AutoTokenizer.from_pretrained(model_id)
bert_qa = BertForQuestionAnswering.from_pretrained(model_id).to(device)
bert_qa = bert_qa.half()

def get_answers(ctxq):
  inputs = tokenizer(ctxq, padding=True, return_tensors='pt')
  for k,v in inputs.items():
    inputs[k] = v.to(device)

  with torch.no_grad():
    outputs = bert_qa(**inputs)

  start_idxs = outputs.start_logits.argmax(dim=-1)
  end_idxs = outputs.end_logits.argmax(dim=-1)

  predictions = []
  for i, (start_idx, end_idx) in enumerate(zip(start_idxs, end_idxs)):
    if start_idx == end_idx:
      predictions.append("<no_answer>")
    else:
      predict_answer_tokens = inputs['input_ids'][i, start_idx : end_idx]
      pred_answer = tokenizer.decode(predict_answer_tokens)
      predictions.append(pred_answer)
  return predictions


context = """In Q3 2024, xAI raised $6 billion in a Series C round led by Valor Equity Partners and Andreessen Horowitz, with participation from Sequoia Capital, Fidelity, and Saudi Arabia’s Kingdom Holding Company, bringing its post-money valuation to $50 billion.
"""
question_1 = "Which two investors co-led xAI’s $6 billion Series C round announced in Q3 2024?"
question_2 = "On what exact date in Q3 2024 was xAI’s $6 billion Series C funding round officially closed?"

get_answers([
    [context, question_1],
    [context, question_2],
])

>>> ['valor equity partners and andreessen horowitz', '<no_answer>']

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 2
mixed_precision_training: Native AMP