Fine-Tuned T5-Small for TriviaQA

This repository contains a T5-small model fine-tuned on approximately 20,000 cleaned question-answer pairs from the TriviaQA dataset.

The primary goal of this project was educational: to practice dataset preprocessing, learn the workflow for fine-tuning sequence-to-sequence models, and test the factual question-answering abilities of T5.

Note: This model is not intended for production-level accuracy.

Overview

Base Model: t5-small (60 million parameters)

Task: Abstractive Question Answering (short-form trivia)

Training Data: ~20,000 samples from TriviaQA

Expected Output: Answers are typically 1–3 words

Training Details

Epochs: 3

Batch Size: 16

Hardware: NVIDIA GTX 1050 (4GB VRAM)

Limitations and Behavior

This model was trained as an experiment and has significant limitations.

Limited Factual Memory: The t5-small model is not large enough to store a vast amount of "world knowledge."

Small Dataset: Training on only 20k examples is insufficient for the model to learn facts it hasn't seen.

No Retrieval: This is a standard (non-RAG) model. It cannot "look up" answers from an external source like Google or Wikipedia.

Potential for Hallucination: The model may guess or provide a confident-sounding but incorrect answer, especially for questions outside its training data.

This behavior is expected for a small encoder-decoder model trained on a limited dataset.

How to Improve

To build a robust and accurate trivia bot, the following steps would be necessary:

Implement RAG: Add a retrieval-augmented generation (RAG) pipeline. This would allow the model to search a knowledge base (e.g., Google, Wikipedia) for relevant context before formulating an answer.

Use a Larger Model: Start with a more capable base model, such as Flan-T5-Large, Flan-T5-XL, or a modern decoder-based model (e.g., Mistral, Llama 3).

Use the Full Dataset: Train on the complete TriviaQA dataset.

Prompt Engineering: Use stricter, more detailed prompts to force the model to generate only short, precise answers.

Downloads last month: 16

Safetensors

Model size

60.5M params

Tensor type

F32

Model tree for prajwalmani/t5-small-trivia-qa

Base model

google-t5/t5-small

Finetuned

(2196)

this model

prajwalmani
/

t5-small-trivia-qa

Model tree for prajwalmani/t5-small-trivia-qa

Dataset used to train prajwalmani/t5-small-trivia-qa