Qwen3-14B LoRA for Math Misconception Detection

Model Description

This model is a QLoRA (Quantized Low-Rank Adaptation) fine-tuned version of Qwen3-14B for identifying student mathematical misconceptions from their written explanations. It was trained as part of the Kaggle MAP (Misconception Annotation Project) competition, achieving a MAP@3 score of 0.944 individually and contributing to a 0.947 ensemble solution that earned a Silver Medal (45th place).

⚠️ Important: This repository contains only LoRA adapter weights, not the full model. You must load the base model (Qwen/Qwen3-14B) and merge these adapters to use the model.

Key Features

  • 🎯 Task: Multi-class text classification for math misconception detection
  • 🧠 Architecture: Qwen3-14B with LoRA adapters (R=16, Ξ±=32)
  • πŸ’Ύ Efficiency: 4-bit quantization (QLoRA) for memory-efficient training
  • πŸ“Š Performance: 0.944 MAP@3 on validation set
  • ⚑ Training: 3 epochs, 11.5 hours on 4Γ—L4 GPUs

Quick Start

Installation

pip install torch transformers peft bitsandbytes accelerate

Loading the Model (Method 1: Recommended)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

# Load base model and LoRA adapters
base_model = AutoModelForSequenceClassification.from_pretrained(
    "Qwen/Qwen3-14B",
    num_labels=65,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(
    base_model, 
    "jatinmehra/Qwen-3-14B-MATH-Misconception-Annotation-Project"
)

# Merge adapters for faster inference (optional but recommended)
model = model.merge_and_unload()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-14B",
    trust_remote_code=True
)

# Move to GPU and set to eval mode
model = model.cuda()
model.eval()

Loading with 4-bit Quantization (Memory Efficient)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig
from peft import PeftModel

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load quantized base model and LoRA adapters
base_model = AutoModelForSequenceClassification.from_pretrained(
    "Qwen/Qwen3-14B",
    num_labels=2675,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(
    base_model,
    "jatinmehra/Qwen-3-14B-MATH-Misconception-Annotation-Project"
)

tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-14B",
    trust_remote_code=True
)

model.eval()

Inference Example

import numpy as np

# Example input (format used during training)
question = "Which of the following is equivalent to 3(2x + 5)?"
answer = "6x + 5"
is_correct = "No"
explanation = "I distributed the 3 to 2x but forgot to distribute it to 5"

# Format input
input_text = f"""Question: {question}
Answer: {answer}
Is Correct Answer: {is_correct}
Student Explanation: {explanation}"""

# Tokenize
inputs = tokenizer(
    input_text,
    truncation=True,
    max_length=256,
    return_tensors="pt"
).to(model.device)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.nn.functional.softmax(logits, dim=-1)

# Get top 3 predictions
top_k = 3
top_probs, top_indices = torch.topk(probs, top_k, dim=-1)

print(f"Top {top_k} Predictions:")
for i in range(top_k):
    class_id = top_indices[0][i].item()
    confidence = top_probs[0][i].item()
    print(f"{i+1}. Class {class_id}: {confidence:.4f}")

Batch Inference

import pandas as pd

def predict_batch(texts, batch_size=8):
    """Process multiple examples efficiently"""
    all_probs = []
    
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]
        
        # Tokenize batch
        inputs = tokenizer(
            batch_texts,
            truncation=True,
            max_length=256,
            padding=True,
            return_tensors="pt"
        ).to(model.device)
        
        # Inference
        with torch.no_grad():
            outputs = model(**inputs)
            probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
            all_probs.append(probs.cpu().numpy())
    
    return np.vstack(all_probs)

# Example usage
test_data = pd.read_csv("test.csv")
formatted_texts = [
    f"Question: {row['QuestionText']}\n"
    f"Answer: {row['MC_Answer']}\n"
    f"Is Correct Answer: {row['IsCorrect']}\n"
    f"Student Explanation: {row['StudentExplanation']}"
    for _, row in test_data.iterrows()
]

predictions = predict_batch(formatted_texts, batch_size=8)
top3_classes = np.argsort(-predictions, axis=1)[:, :3]

Training Details

Training Method: QLoRA (Quantized Low-Rank Adaptation)

This model uses QLoRA, an efficient fine-tuning technique that combines:

  1. 4-bit Quantization (NF4): Reduces memory footprint by quantizing base model weights
  2. Double Quantization: Further compresses quantization constants
  3. LoRA Adapters: Trains only small low-rank matrices instead of full model weights
  4. bfloat16 Compute: Uses bfloat16 for actual computations despite 4-bit storage

Benefits:

  • Trains 14B parameter model on 4Γ—L4 GPUs (24GB VRAM each)
  • Reduces memory by ~75% compared to full fine-tuning
  • Maintains 99%+ of full fine-tuning performance
  • Faster training and inference

LoRA Configuration

lora_config = LoraConfig(
    r=16,                    # Low-rank dimension
    lora_alpha=32,          # Scaling factor
    target_modules=[        # Attention & MLP layers
        "q_proj", 
        "v_proj", 
        "o_proj", 
        "gate_proj", 
        "up_proj", 
        "down_proj"
    ],
    lora_dropout=0.1,       # Regularization
    bias="none",
    task_type="SEQ_CLS",    # Sequence classification
    modules_to_save=["score"]  # Save classification head
)

Training Hyperparameters

Hyperparameter Value
Base Model Qwen/Qwen3-14B
Epochs 3
Learning Rate 2e-4
LR Scheduler Cosine with warmup
Warmup Ratio 0.1
Batch Size 8 per device
Gradient Accumulation 4 steps
Effective Batch Size 128 (8 Γ— 4 devices Γ— 4 accumulation)
Max Sequence Length 256 tokens
Precision bfloat16
Gradient Checkpointing Enabled
Quantization 4-bit NF4
GPUs 4Γ—NVIDIA L4 (24GB)
Training Time 11 hours 34 minutes

Data Format

The model expects inputs in the following format:

Question: {question_text}
Answer: {student_answer}
Is Correct Answer: {Yes/No}
Student Explanation: {student_reasoning}

Use Cases

βœ… Recommended Use Cases

  • Educational Platforms: Automated feedback on student math reasoning
  • Teacher Support Tools: Identifying common misconception patterns
  • Adaptive Learning Systems: Personalizing instruction based on detected errors
  • Research: Analyzing student thinking patterns at scale
  • Assessment Tools: Diagnostic evaluation of conceptual understanding

❌ Not Recommended For

  • Math problem solving (this is a classification model, not a solver)
  • Generating explanations (classification only)
  • Non-mathematical domains
  • Grading or high-stakes testing without human review
  • Students outside 9-16 age range without validation

Ethical Considerations

Intended Use

This model is designed to support educators, not replace them. It should be used as a diagnostic tool to help teachers:

  • Identify common misconceptions quickly
  • Provide targeted feedback
  • Understand student reasoning patterns

Potential Risks

  1. Bias: May reflect biases in training data (e.g., language patterns, demographics)
  2. Misclassification: Not 100% accurate; false positives/negatives will occur
  3. Over-reliance: Should not be sole basis for educational decisions
  4. Privacy: Student data must be handled according to educational privacy regulations (FERPA, GDPR, etc.)
  5. Fairness: May perform differently across student populations

Recommendations

  • Always have human review for high-stakes decisions
  • Monitor performance across different student populations
  • Combine with other assessment methods
  • Respect student privacy and data protection laws
  • Use as one tool among many in educational practice

Citation

If you use this model in your research or application, please cite:

@misc{qwen3-14b-math-misconception-lora,
  author = {Jatin Mehra},
  title = {Qwen3-14B LoRA for Math Misconception Detection},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/jatinmehra/Qwen-3-14B-MATH-Misconception-Annotation-Project}},
  note = {Silver Medal Solution (45th place) in Kaggle MAP Competition}
}

@inproceedings{map-competition-2025,
  title = {MAP: Charting Student Math Misunderstandings},
  author = {Vanderbilt University and The Learning Agency},
  year = {2025},
  organization = {Kaggle}
}

Questions or Issues? Please open an issue on GitHub

Made with ❀️ for better math education

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jatinmehra/Qwen-3-14B-MATH-Misconception-Annotation-Project

Finetuned
Qwen/Qwen3-14B
Adapter
(70)
this model