Model Card for ShAIkespear/Phi-2_DPO_M2

A LoRA-finetuned version of microsoft/phi-2 specialized for multiple-choice question answering (MCQA), particularly in STEM domains. This model represents the Direct Preference Optimization (DPO) phase of the ShAIkespear project. It provides strong baseline reasoning and structured answer generation in the format ### Question → ### Explanation → ### Answer.

Model Details

Developed by: ShAIkespear team
Shared by: ShAIkespear team
Model type: Causal LM (decoder-only, Phi-2 backbone)
Languages: English
License: MIT
Finetuned from: microsoft/phi-2

Model Sources

Repository: 2.8B-Phi-2-LLM-QA
Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

Uses

Direct Use

MCQA answering for math, science, and general knowledge (e.g., MMLU, ScienceQA).
Educational use cases or tutoring tasks requiring short, structured explanations.

Out-of-Scope Use

Factual QA or reasoning beyond multiple-choice formats.
Autonomous grading or unsupervised assessment systems.

Bias, Risks, and Limitations

Reasoning depth: Limited for complex math proofs or open-ended reasoning.
Data bias: MCQA datasets may reflect selection bias toward academic subjects.
Overfitting: Strong performance on training-style questions, weaker generalization to unseen formats.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M2")
model = AutoModelForCausalLM.from_pretrained("ShAIkespear/Phi-2_DPO_M2", device_map="auto")

prompt = "### Question: What is the chemical symbol for water?\n### Explanation: Water is composed of hydrogen and oxygen.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=10)
print(tok.decode(out[0], skip_special_tokens=True))

Training Details

Data: MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, EPFL-curated exam questions.
Format: Unified MCQA schema with explicit prompt headers.
LoRA: rank=16, α=16, dropout=0.05
Batch size: 4 (train/eval)
LR: 1e-5 (public data)
Frameworks: Hugging Face TRL + PEFT/LoRA

Evaluation Summary

Strengths: Solid baseline for MCQA; structured, explainable outputs.
Limitations: Unaligned with preferences; may produce verbose or redundant text.
Use case: Base model for DPO fine-tuning (as in Anton variants).

Technical Specifications

Architecture: Phi-2 decoder-only transformer (~2.78B params).
Objective: Next-token prediction (SFT).
Software: Hugging Face Transformers, TRL, PEFT.

Glossary

MCQA: Multiple-Choice Question Answering
SFT: Supervised Finetuning
LoRA: Low-Rank Adaptation

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShAIkespear/Phi-2_DPO_M2

Base model

microsoft/phi-2

Finetuned

(434)

this model

Collection including ShAIkespear/Phi-2_DPO_M2

Microsoft/phi-2 finetuned

Collection

Collection of finetuned models of Microsoft phi-2 for Q&A. • 7 items • Updated 21 days ago