πŸ† Phi-3 Domain Classification Model - 98.26% Accuracy

Fine-tuned Phi-3-mini-4k-instruct achieving 98.26% accuracy on domain classification across 16 domains.

🎯 Model Performance

  • Test Accuracy: 98.26%
  • F1 (Macro): 97.18%
  • F1 (Weighted): 98.07%
  • Perfect Domains: 9/16 (100% precision & recall)
  • Near-Perfect Domains: 15/16 (>95% F1)

πŸ“Š Performance Metrics

Metric Value
Accuracy 98.26%
F1 (Macro) 97.18%
F1 (Weighted) 98.07%
Training Time ~3-4 hours
Perfect Domains 9/16

🎨 Supported Domains

The model classifies text into 16 domains:

  1. coding - Programming and software development (98% F1)
  2. api_generation - API design and implementation (98% F1)
  3. mathematics - Mathematical problems and concepts (99% F1)
  4. data_analysis - Data science and analytics (96% F1)
  5. science - Scientific queries (100% F1) ⭐
  6. medicine - Medical and healthcare topics (100% F1) ⭐
  7. business - Business and commerce (97% F1)
  8. law - Legal matters (100% F1) ⭐
  9. technology - Tech industry and products (100% F1) ⭐
  10. literature - Books, writing, poetry (100% F1) ⭐
  11. creative_content - Art, music, creative work (100% F1) ⭐
  12. education - Learning and teaching (100% F1) ⭐
  13. general_knowledge - General information (97% F1)
  14. ambiguous - Unclear or multi-interpretation queries (100% F1) ⭐
  15. sensitive - Sensitive topics requiring care (100% F1) ⭐
  16. multi_domain - Cross-domain queries (71% F1)

⭐ = Perfect classification (100% precision & recall)

πŸ”§ Training Configuration

Model Architecture

  • Base Model: microsoft/Phi-3-mini-4k-instruct (3.82B parameters)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • Target Modules: qkv_proj, o_proj, gate_up_proj, down_proj

Training Hyperparameters

  • Epochs: 25 (proven optimal)
  • Learning Rate: 2e-4
  • LR Scheduler: Cosine
  • Warmup Ratio: 10%
  • Batch Size: 32 (effective)
  • Label Smoothing: 0.1
  • Precision: BF16

Training Strategy

  • βœ… Clean dataset (no data augmentation)
  • βœ… Standard cosine schedule
  • βœ… Best checkpoint loading
  • βœ… Gradient checkpointing
  • βœ… Reproducible (seed=42)

πŸš€ Quick Start

Installation

pip install transformers peft torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
import json

# Load model
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(
    base_model,
    "ovinduG/phi3-domain-classifier-98.26"
)

tokenizer = AutoTokenizer.from_pretrained(
    "ovinduG/phi3-domain-classifier-98.26",
    trust_remote_code=True
)

# Classification function
def classify_domain(text):
    messages = [
        {
            "role": "system",
            "content": "You are a domain classifier. Respond with JSON."
        },
        {
            "role": "user",
            "content": f"Classify: {text}"
        }
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=50,
            temperature=0.1,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id
        )

    response = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[-1]:],
        skip_special_tokens=True
    )

    # Parse JSON response
    try:
        response_clean = response.strip()
        if '```' in response_clean:
            response_clean = response_clean.split('```')[1]
        if response_clean.startswith('json'):
            response_clean = response_clean[4:]
        return json.loads(response_clean.strip())
    except:
        return {"primary_domain": "unknown", "confidence": "low"}

# Example usage
result = classify_domain("Write a Python function to sort a list")
print(result)
# Output: {"primary_domain": "coding", "confidence": "high"}

result = classify_domain("What are the symptoms of diabetes?")
print(result)
# Output: {"primary_domain": "medicine", "confidence": "high"}

result = classify_domain("Explain quantum entanglement")
print(result)
# Output: {"primary_domain": "science", "confidence": "high"}

Batch Classification

def classify_batch(texts, batch_size=8):
    # Classify multiple texts efficiently
    results = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        for text in batch:
            results.append(classify_domain(text))
    return results

# Example
texts = [
    "How to implement OAuth2?",
    "Best practices for diabetes management",
    "Write a sorting algorithm in Python"
]

results = classify_batch(texts)
for text, result in zip(texts, results):
    print(f"{text[:50]:50s} β†’ {result['primary_domain']}")

πŸ“ˆ Performance Details

Per-Domain Results

Domain Precision Recall F1-Score Support
ambiguous 1.00 1.00 1.00 45
api_generation 0.96 1.00 0.98 45
business 0.96 0.98 0.97 44
coding 0.95 1.00 0.98 42
creative_content 1.00 1.00 1.00 45
data_analysis 0.93 1.00 0.96 41
education 1.00 1.00 1.00 45
general_knowledge 0.96 0.98 0.97 45
law 1.00 1.00 1.00 46
literature 1.00 1.00 1.00 45
mathematics 0.98 1.00 0.99 46
medicine 1.00 1.00 1.00 45
multi_domain 1.00 0.55 0.71 22
science 1.00 1.00 1.00 45
sensitive 1.00 1.00 1.00 45
technology 1.00 1.00 1.00 44

Comparison with Baselines

Model Accuracy Notes
This model (25 epochs) 98.26% βœ… Optimal
Original 25 epochs 97.97% Good baseline
30 epochs attempt 94.64% Overfitted ❌
50 epochs attempt ~85-90% Severe overfitting ❌

🎯 Use Cases

1. Content Routing

Route user queries to appropriate specialists or systems:

query = "How do I treat a sprained ankle?"
domain = classify_domain(query)["primary_domain"]
# β†’ "medicine" β†’ Route to medical expert

2. Support Ticket Classification

Automatically categorize support tickets:

ticket = "Our API returns 401 errors"
domain = classify_domain(ticket)["primary_domain"]
# β†’ "api_generation" β†’ Route to API team

3. Content Moderation

Identify sensitive content requiring review:

post = "Discussion about controversial topic"
result = classify_domain(post)
if result["primary_domain"] == "sensitive":
    # Flag for manual review
    pass

4. Search & Discovery

Improve search by understanding query intent:

search_query = "best sorting algorithms"
domain = classify_domain(search_query)["primary_domain"]
# β†’ "coding" β†’ Show programming results

πŸ” Model Behavior

Strengths

  • βœ… 9 perfect domains (100% precision & recall)
  • βœ… High precision across all domains (93-100%)
  • βœ… Consistent performance on similar queries
  • βœ… Fast inference with LoRA
  • βœ… Low memory footprint (~200MB adapters)

Limitations

  • ⚠️ Multi-domain classification is challenging (71% F1)
    • Queries spanning multiple domains are harder to classify
    • Model tends to pick a single primary domain
  • ⚠️ Requires exact domain list - cannot handle new domains without retraining
  • ⚠️ English only - trained on English text

Recommendations

  • For multi-domain queries, consider using ensemble or multi-label classification
  • Validate outputs in production with confidence thresholds
  • Monitor edge cases and collect feedback for model improvements

πŸ“ Repository Contents

  • adapter_config.json - LoRA configuration
  • adapter_model.safetensors - Fine-tuned LoRA weights
  • tokenizer files - Tokenizer configuration
  • test_results.json - Comprehensive evaluation metrics
  • training_curves.png - Training/validation loss curves
  • confusion_matrix.png - Per-domain performance visualization
  • final_dataset_*.csv - Training/validation/test datasets

πŸ”„ Reproducibility

This model can be reproduced using the exact configuration above. Key factors:

  • Seed: 42 (for reproducibility)
  • No data augmentation (clean training)
  • Exact hyperparameters documented
  • Best checkpoint selection (not last)

πŸ“Š Training History

The model was trained with several attempts to optimize performance:

  1. Original run: 97.97% accuracy (25 epochs)
  2. 30 epochs attempt: 94.64% - overfitted due to data augmentation
  3. 50 epochs attempt: ~85-90% - severe overfitting
  4. Final reproduction: 98.26% - optimal configuration βœ…

Key insight: 25 epochs is the sweet spot for this task and dataset.

βš–οΈ License & Citation

License

This model is released under MIT License. The base Phi-3 model has its own license from Microsoft.

Citation

If you use this model, please cite:

@model{phi3-domain-classifier-98,
  author = {ovinduG},
  title = {Phi-3 Domain Classification Model - 98.26% Accuracy},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ovinduG/phi3-domain-classifier-98.26}}
}

Also cite the original Phi-3 paper:

@article{phi3,
  title={Phi-3 Technical Report},
  author={Microsoft},
  year={2024}
}

🀝 Contributing

Found an issue or have suggestions? Please open an issue on the model repository.

πŸ“ž Contact

πŸ™ Acknowledgments

  • Microsoft for the Phi-3 base model
  • Hugging Face for the transformers library
  • PEFT library for LoRA implementation

Model Status: βœ… Production-Ready | Accuracy: 98.26% | Perfect Domains: 9/16

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ovinduG/phi3-domain-classifier-98.26

Adapter
(800)
this model

Space using ovinduG/phi3-domain-classifier-98.26 1

Evaluation results