Model Card for ShAIkespear/Phi-2_DPO_M3_Base_Alt

A LoRA-finetuned and **Direct Preference Optimization (DPO)**–aligned variant of microsoft/phi-2, specialized for multiple-choice question answering (MCQA) with an emphasis on STEM and general knowledge domains. This model represents the alternative base configuration of the final M3 (balanced-then-DPO) training pipeline from the ShAIkespear project. It preserves full precision for highest fidelity and further fine-tuning, without 8-bit quantization.

Model Details

Developed by: ShAIkespear team
Shared by: ShAIkespear team
Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned
Languages: English
License: MIT
Finetuned from: microsoft/phi-2

Model Sources

Repository: 2.8B-Phi-2-LLM-QA
Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

Uses

Direct Use

MCQA and educational Q&A (MMLU, OpenBookQA, ScienceQA).
Alignment research — comparison between DPO training setups (Base vs. Quantized).
As a high-fidelity reference checkpoint for quantized and downstream variants.

Out-of-Scope Use

High-stakes or safety-critical applications (medical, legal, policy).
Generative tasks outside multiple-choice reasoning.
Misuse in automated exam solving or confidential data leakage.

Bias, Risks, and Limitations

Domain bias: Stronger on factual MCQA, weaker on advanced reasoning tasks.
Answer drift: May occasionally produce verbose or follow-up answers without explicit formatting.
Data source risks: EPFL-derived preferences may encode narrow style biases.

Recommendations

Maintain the structured prompt format:

### Question ...
### Explanation ...
### Answer:

Keep human supervision in any educational or grading use.
Prefer this full-precision model for fine-tuning or evaluation; use quantized versions for deployment.

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ShAIkespear/Phi-2_DPO_M3_Base_Alt"

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "### Question: Which element has the chemical symbol 'O'?\n### Explanation: The symbol 'O' represents this essential gas.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))

Training Details

Training Data

SFT stage: Balanced MCQA mix — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL question sets.
DPO stage: Human preference pairs (EPFL exams + public feedback datasets like HelpSteer).
Schema: Unified “### Question / ### Explanation / ### Answer” format.
Filtering: ≤512 tokens, balanced sample caps (~20k per dataset).

Training Procedure

Pipeline: SFT → DPO (M3 configuration).
LoRA parameters: rank = 16, α = 16, dropout = 0.05.
Batch sizes: SFT = 4; DPO = 1.
Learning rates: 1e-5 (public) / 1e-4 (EPFL).
Scheduler: Cosine with warmup.
Frameworks: Hugging Face Transformers + TRL + PEFT (LoRA).

Evaluation Summary

Configuration: M3 Base (Alt) is the unquantized reference model for the quantized 8-bit variant.
Performance: Balanced dataset improves cross-domain consistency; DPO enhances answer formatting and style alignment.
Accuracy: Similar to quantized model (~0.61 MMLU avg.), slightly higher on reasoning subtasks.
Use case: For experimentation, evaluation, or further domain-specific fine-tuning.

Technical Specifications

Architecture: Phi-2 (~2.78B parameters), decoder-only transformer.
Objective: SFT next-token prediction + DPO preference alignment.
Precision: Full precision (fp16/bf16).
Software: Hugging Face Transformers, TRL, PEFT.

Glossary

MCQA: Multiple-Choice Question Answering
SFT: Supervised Finetuning
DPO: Direct Preference Optimization
LoRA: Low-Rank Adaptation
Alt (Alternative): Internal naming for the alternate full-precision checkpoint variant of M3

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShAIkespear/Phi-2_DPO_M3_Base_Alt

Base model

microsoft/phi-2

Finetuned

(434)

this model

Collection including ShAIkespear/Phi-2_DPO_M3_Base_Alt

Microsoft/phi-2 finetuned

Collection

Collection of finetuned models of Microsoft phi-2 for Q&A. • 7 items • Updated 21 days ago