Model Card for ShAIkespear/Phi-2_DPO_M3_Base_Alt

A LoRA-finetuned and **Direct Preference Optimization (DPO)**–aligned variant of microsoft/phi-2, specialized for multiple-choice question answering (MCQA) with an emphasis on STEM and general knowledge domains. This model represents the alternative base configuration of the final M3 (balanced-then-DPO) training pipeline from the ShAIkespear project. It preserves full precision for highest fidelity and further fine-tuning, without 8-bit quantization.


Model Details

  • Developed by: ShAIkespear team
  • Shared by: ShAIkespear team
  • Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned
  • Languages: English
  • License: MIT
  • Finetuned from: microsoft/phi-2

Model Sources

  • Repository: 2.8B-Phi-2-LLM-QA
  • Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

Uses

Direct Use

  • MCQA and educational Q&A (MMLU, OpenBookQA, ScienceQA).
  • Alignment research — comparison between DPO training setups (Base vs. Quantized).
  • As a high-fidelity reference checkpoint for quantized and downstream variants.

Out-of-Scope Use

  • High-stakes or safety-critical applications (medical, legal, policy).
  • Generative tasks outside multiple-choice reasoning.
  • Misuse in automated exam solving or confidential data leakage.

Bias, Risks, and Limitations

  • Domain bias: Stronger on factual MCQA, weaker on advanced reasoning tasks.
  • Answer drift: May occasionally produce verbose or follow-up answers without explicit formatting.
  • Data source risks: EPFL-derived preferences may encode narrow style biases.

Recommendations

  • Maintain the structured prompt format:

    ### Question ...
    ### Explanation ...
    ### Answer:
    
  • Keep human supervision in any educational or grading use.

  • Prefer this full-precision model for fine-tuning or evaluation; use quantized versions for deployment.


How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ShAIkespear/Phi-2_DPO_M3_Base_Alt"

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "### Question: Which element has the chemical symbol 'O'?\n### Explanation: The symbol 'O' represents this essential gas.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))

Training Details

Training Data

  • SFT stage: Balanced MCQA mix — MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K, and EPFL question sets.
  • DPO stage: Human preference pairs (EPFL exams + public feedback datasets like HelpSteer).
  • Schema: Unified “### Question / ### Explanation / ### Answer” format.
  • Filtering: ≤512 tokens, balanced sample caps (~20k per dataset).

Training Procedure

  • Pipeline: SFT → DPO (M3 configuration).
  • LoRA parameters: rank = 16, α = 16, dropout = 0.05.
  • Batch sizes: SFT = 4; DPO = 1.
  • Learning rates: 1e-5 (public) / 1e-4 (EPFL).
  • Scheduler: Cosine with warmup.
  • Frameworks: Hugging Face Transformers + TRL + PEFT (LoRA).

Evaluation Summary

  • Configuration: M3 Base (Alt) is the unquantized reference model for the quantized 8-bit variant.
  • Performance: Balanced dataset improves cross-domain consistency; DPO enhances answer formatting and style alignment.
  • Accuracy: Similar to quantized model (~0.61 MMLU avg.), slightly higher on reasoning subtasks.
  • Use case: For experimentation, evaluation, or further domain-specific fine-tuning.

Technical Specifications

  • Architecture: Phi-2 (~2.78B parameters), decoder-only transformer.
  • Objective: SFT next-token prediction + DPO preference alignment.
  • Precision: Full precision (fp16/bf16).
  • Software: Hugging Face Transformers, TRL, PEFT.

Glossary

  • MCQA: Multiple-Choice Question Answering
  • SFT: Supervised Finetuning
  • DPO: Direct Preference Optimization
  • LoRA: Low-Rank Adaptation
  • Alt (Alternative): Internal naming for the alternate full-precision checkpoint variant of M3
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShAIkespear/Phi-2_DPO_M3_Base_Alt

Base model

microsoft/phi-2
Finetuned
(434)
this model

Collection including ShAIkespear/Phi-2_DPO_M3_Base_Alt