Model Card for ShAIkespear/Phi-2_DPO_M2

A LoRA-finetuned version of microsoft/phi-2 specialized for multiple-choice question answering (MCQA), particularly in STEM domains. This model represents the Direct Preference Optimization (DPO) phase of the ShAIkespear project. It provides strong baseline reasoning and structured answer generation in the format ### Question → ### Explanation → ### Answer.

Model Details

  • Developed by: ShAIkespear team
  • Shared by: ShAIkespear team
  • Model type: Causal LM (decoder-only, Phi-2 backbone)
  • Languages: English
  • License: MIT
  • Finetuned from: microsoft/phi-2

Model Sources

  • Repository: 2.8B-Phi-2-LLM-QA
  • Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

Uses

Direct Use

  • MCQA answering for math, science, and general knowledge (e.g., MMLU, ScienceQA).
  • Educational use cases or tutoring tasks requiring short, structured explanations.

Out-of-Scope Use

  • Factual QA or reasoning beyond multiple-choice formats.
  • Autonomous grading or unsupervised assessment systems.

Bias, Risks, and Limitations

  • Reasoning depth: Limited for complex math proofs or open-ended reasoning.
  • Data bias: MCQA datasets may reflect selection bias toward academic subjects.
  • Overfitting: Strong performance on training-style questions, weaker generalization to unseen formats.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("ShAIkespear/Phi-2_DPO_M2")
model = AutoModelForCausalLM.from_pretrained("ShAIkespear/Phi-2_DPO_M2", device_map="auto")

prompt = "### Question: What is the chemical symbol for water?\n### Explanation: Water is composed of hydrogen and oxygen.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=10)
print(tok.decode(out[0], skip_special_tokens=True))

Training Details

  • Data: MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, EPFL-curated exam questions.
  • Format: Unified MCQA schema with explicit prompt headers.
  • LoRA: rank=16, α=16, dropout=0.05
  • Batch size: 4 (train/eval)
  • LR: 1e-5 (public data)
  • Frameworks: Hugging Face TRL + PEFT/LoRA

Evaluation Summary

  • Strengths: Solid baseline for MCQA; structured, explainable outputs.
  • Limitations: Unaligned with preferences; may produce verbose or redundant text.
  • Use case: Base model for DPO fine-tuning (as in Anton variants).

Technical Specifications

  • Architecture: Phi-2 decoder-only transformer (~2.78B params).
  • Objective: Next-token prediction (SFT).
  • Software: Hugging Face Transformers, TRL, PEFT.

Glossary

  • MCQA: Multiple-Choice Question Answering
  • SFT: Supervised Finetuning
  • LoRA: Low-Rank Adaptation
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShAIkespear/Phi-2_DPO_M2

Base model

microsoft/phi-2
Finetuned
(434)
this model

Collection including ShAIkespear/Phi-2_DPO_M2