Qwen3-14B LoRA for Math Misconception Detection

Model Description

This model is a QLoRA (Quantized Low-Rank Adaptation) fine-tuned version of Qwen3-14B for identifying student mathematical misconceptions from their written explanations. It was trained as part of the Kaggle MAP (Misconception Annotation Project) competition, achieving a MAP@3 score of 0.944 individually and contributing to a 0.947 ensemble solution that earned a Silver Medal (45th place).

⚠️ Important: This repository contains only LoRA adapter weights, not the full model. You must load the base model (Qwen/Qwen3-14B) and merge these adapters to use the model.

Key Features

🎯 Task: Multi-class text classification for math misconception detection
🧠 Architecture: Qwen3-14B with LoRA adapters (R=16, α=32)
💾 Efficiency: 4-bit quantization (QLoRA) for memory-efficient training
📊 Performance: 0.944 MAP@3 on validation set
⚡ Training: 3 epochs, 11.5 hours on 4×L4 GPUs

Quick Start

Installation

pip install torch transformers peft bitsandbytes accelerate

Loading the Model (Method 1: Recommended)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

# Load base model and LoRA adapters
base_model = AutoModelForSequenceClassification.from_pretrained(
    "Qwen/Qwen3-14B",
    num_labels=65,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(
    base_model, 
    "jatinmehra/Qwen-3-14B-MATH-Misconception-Annotation-Project"
)

# Merge adapters for faster inference (optional but recommended)
model = model.merge_and_unload()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-14B",
    trust_remote_code=True
)

# Move to GPU and set to eval mode
model = model.cuda()
model.eval()

Loading with 4-bit Quantization (Memory Efficient)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig
from peft import PeftModel

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load quantized base model and LoRA adapters
base_model = AutoModelForSequenceClassification.from_pretrained(
    "Qwen/Qwen3-14B",
    num_labels=2675,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(
    base_model,
    "jatinmehra/Qwen-3-14B-MATH-Misconception-Annotation-Project"
)

tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-14B",
    trust_remote_code=True
)

model.eval()

Inference Example

import numpy as np

# Example input (format used during training)
question = "Which of the following is equivalent to 3(2x + 5)?"
answer = "6x + 5"
is_correct = "No"
explanation = "I distributed the 3 to 2x but forgot to distribute it to 5"

# Format input
input_text = f"""Question: {question}
Answer: {answer}
Is Correct Answer: {is_correct}
Student Explanation: {explanation}"""

# Tokenize
inputs = tokenizer(
    input_text,
    truncation=True,
    max_length=256,
    return_tensors="pt"
).to(model.device)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.nn.functional.softmax(logits, dim=-1)

# Get top 3 predictions
top_k = 3
top_probs, top_indices = torch.topk(probs, top_k, dim=-1)

print(f"Top {top_k} Predictions:")
for i in range(top_k):
    class_id = top_indices[0][i].item()
    confidence = top_probs[0][i].item()
    print(f"{i+1}. Class {class_id}: {confidence:.4f}")

Batch Inference

import pandas as pd

def predict_batch(texts, batch_size=8):
    """Process multiple examples efficiently"""
    all_probs = []
    
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]
        
        # Tokenize batch
        inputs = tokenizer(
            batch_texts,
            truncation=True,
            max_length=256,
            padding=True,
            return_tensors="pt"
        ).to(model.device)
        
        # Inference
        with torch.no_grad():
            outputs = model(**inputs)
            probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
            all_probs.append(probs.cpu().numpy())
    
    return np.vstack(all_probs)

# Example usage
test_data = pd.read_csv("test.csv")
formatted_texts = [
    f"Question: {row['QuestionText']}\n"
    f"Answer: {row['MC_Answer']}\n"
    f"Is Correct Answer: {row['IsCorrect']}\n"
    f"Student Explanation: {row['StudentExplanation']}"
    for _, row in test_data.iterrows()
]

predictions = predict_batch(formatted_texts, batch_size=8)
top3_classes = np.argsort(-predictions, axis=1)[:, :3]

Training Details

Training Method: QLoRA (Quantized Low-Rank Adaptation)

This model uses QLoRA, an efficient fine-tuning technique that combines:

4-bit Quantization (NF4): Reduces memory footprint by quantizing base model weights
Double Quantization: Further compresses quantization constants
LoRA Adapters: Trains only small low-rank matrices instead of full model weights
bfloat16 Compute: Uses bfloat16 for actual computations despite 4-bit storage

Benefits:

Trains 14B parameter model on 4×L4 GPUs (24GB VRAM each)
Reduces memory by ~75% compared to full fine-tuning
Maintains 99%+ of full fine-tuning performance
Faster training and inference

LoRA Configuration

lora_config = LoraConfig(
    r=16,                    # Low-rank dimension
    lora_alpha=32,          # Scaling factor
    target_modules=[        # Attention & MLP layers
        "q_proj", 
        "v_proj", 
        "o_proj", 
        "gate_proj", 
        "up_proj", 
        "down_proj"
    ],
    lora_dropout=0.1,       # Regularization
    bias="none",
    task_type="SEQ_CLS",    # Sequence classification
    modules_to_save=["score"]  # Save classification head
)

Training Hyperparameters

Hyperparameter	Value
Base Model	Qwen/Qwen3-14B
Epochs	3
Learning Rate	2e-4
LR Scheduler	Cosine with warmup
Warmup Ratio	0.1
Batch Size	8 per device
Gradient Accumulation	4 steps
Effective Batch Size	128 (8 × 4 devices × 4 accumulation)
Max Sequence Length	256 tokens
Precision	bfloat16
Gradient Checkpointing	Enabled
Quantization	4-bit NF4
GPUs	4×NVIDIA L4 (24GB)
Training Time	11 hours 34 minutes

Data Format

The model expects inputs in the following format:

Question: {question_text}
Answer: {student_answer}
Is Correct Answer: {Yes/No}
Student Explanation: {student_reasoning}

Use Cases

✅ Recommended Use Cases

Educational Platforms: Automated feedback on student math reasoning
Teacher Support Tools: Identifying common misconception patterns
Adaptive Learning Systems: Personalizing instruction based on detected errors
Research: Analyzing student thinking patterns at scale
Assessment Tools: Diagnostic evaluation of conceptual understanding

❌ Not Recommended For

Math problem solving (this is a classification model, not a solver)
Generating explanations (classification only)
Non-mathematical domains
Grading or high-stakes testing without human review
Students outside 9-16 age range without validation

Ethical Considerations

Intended Use

This model is designed to support educators, not replace them. It should be used as a diagnostic tool to help teachers:

Identify common misconceptions quickly
Provide targeted feedback
Understand student reasoning patterns

Potential Risks

Bias: May reflect biases in training data (e.g., language patterns, demographics)
Misclassification: Not 100% accurate; false positives/negatives will occur
Over-reliance: Should not be sole basis for educational decisions
Privacy: Student data must be handled according to educational privacy regulations (FERPA, GDPR, etc.)
Fairness: May perform differently across student populations

Recommendations

Always have human review for high-stakes decisions
Monitor performance across different student populations
Combine with other assessment methods
Respect student privacy and data protection laws
Use as one tool among many in educational practice

Citation

If you use this model in your research or application, please cite:

@misc{qwen3-14b-math-misconception-lora,
  author = {Jatin Mehra},
  title = {Qwen3-14B LoRA for Math Misconception Detection},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/jatinmehra/Qwen-3-14B-MATH-Misconception-Annotation-Project}},
  note = {Silver Medal Solution (45th place) in Kaggle MAP Competition}
}

@inproceedings{map-competition-2025,
  title = {MAP: Charting Student Math Misunderstandings},
  author = {Vanderbilt University and The Learning Agency},
  year = {2025},
  organization = {Kaggle}
}

Questions or Issues? Please open an issue on GitHub

Made with ❤️ for better math education

Downloads last month: 26

Model tree for jatinmehra/Qwen-3-14B-MATH-Misconception-Annotation-Project

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(70)

this model