🏆 Phi-3 Domain Classification Model - 98.26% Accuracy

Fine-tuned Phi-3-mini-4k-instruct achieving 98.26% accuracy on domain classification across 16 domains.

🎯 Model Performance

Test Accuracy: 98.26%
F1 (Macro): 97.18%
F1 (Weighted): 98.07%
Perfect Domains: 9/16 (100% precision & recall)
Near-Perfect Domains: 15/16 (>95% F1)

📊 Performance Metrics

Metric	Value
Accuracy	98.26%
F1 (Macro)	97.18%
F1 (Weighted)	98.07%
Training Time	~3-4 hours
Perfect Domains	9/16

🎨 Supported Domains

The model classifies text into 16 domains:

coding - Programming and software development (98% F1)
api_generation - API design and implementation (98% F1)
mathematics - Mathematical problems and concepts (99% F1)
data_analysis - Data science and analytics (96% F1)
science - Scientific queries (100% F1) ⭐
medicine - Medical and healthcare topics (100% F1) ⭐
business - Business and commerce (97% F1)
law - Legal matters (100% F1) ⭐
technology - Tech industry and products (100% F1) ⭐
literature - Books, writing, poetry (100% F1) ⭐
creative_content - Art, music, creative work (100% F1) ⭐
education - Learning and teaching (100% F1) ⭐
general_knowledge - General information (97% F1)
ambiguous - Unclear or multi-interpretation queries (100% F1) ⭐
sensitive - Sensitive topics requiring care (100% F1) ⭐
multi_domain - Cross-domain queries (71% F1)

⭐ = Perfect classification (100% precision & recall)

🔧 Training Configuration

Model Architecture

Base Model: microsoft/Phi-3-mini-4k-instruct (3.82B parameters)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 32
LoRA Alpha: 64
Target Modules: qkv_proj, o_proj, gate_up_proj, down_proj

Training Hyperparameters

Epochs: 25 (proven optimal)
Learning Rate: 2e-4
LR Scheduler: Cosine
Warmup Ratio: 10%
Batch Size: 32 (effective)
Label Smoothing: 0.1
Precision: BF16

Training Strategy

✅ Clean dataset (no data augmentation)
✅ Standard cosine schedule
✅ Best checkpoint loading
✅ Gradient checkpointing
✅ Reproducible (seed=42)

🚀 Quick Start

Installation

pip install transformers peft torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
import json

# Load model
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(
    base_model,
    "ovinduG/phi3-domain-classifier-98.26"
)

tokenizer = AutoTokenizer.from_pretrained(
    "ovinduG/phi3-domain-classifier-98.26",
    trust_remote_code=True
)

# Classification function
def classify_domain(text):
    messages = [
        {
            "role": "system",
            "content": "You are a domain classifier. Respond with JSON."
        },
        {
            "role": "user",
            "content": f"Classify: {text}"
        }
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=50,
            temperature=0.1,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id
        )

    response = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[-1]:],
        skip_special_tokens=True
    )

    # Parse JSON response
    try:
        response_clean = response.strip()
        if '```' in response_clean:
            response_clean = response_clean.split('```')[1]
        if response_clean.startswith('json'):
            response_clean = response_clean[4:]
        return json.loads(response_clean.strip())
    except:
        return {"primary_domain": "unknown", "confidence": "low"}

# Example usage
result = classify_domain("Write a Python function to sort a list")
print(result)
# Output: {"primary_domain": "coding", "confidence": "high"}

result = classify_domain("What are the symptoms of diabetes?")
print(result)
# Output: {"primary_domain": "medicine", "confidence": "high"}

result = classify_domain("Explain quantum entanglement")
print(result)
# Output: {"primary_domain": "science", "confidence": "high"}

Batch Classification

def classify_batch(texts, batch_size=8):
    # Classify multiple texts efficiently
    results = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        for text in batch:
            results.append(classify_domain(text))
    return results

# Example
texts = [
    "How to implement OAuth2?",
    "Best practices for diabetes management",
    "Write a sorting algorithm in Python"
]

results = classify_batch(texts)
for text, result in zip(texts, results):
    print(f"{text[:50]:50s} → {result['primary_domain']}")

📈 Performance Details

Per-Domain Results

Domain	Precision	Recall	F1-Score	Support
ambiguous	1.00	1.00	1.00	45
api_generation	0.96	1.00	0.98	45
business	0.96	0.98	0.97	44
coding	0.95	1.00	0.98	42
creative_content	1.00	1.00	1.00	45
data_analysis	0.93	1.00	0.96	41
education	1.00	1.00	1.00	45
general_knowledge	0.96	0.98	0.97	45
law	1.00	1.00	1.00	46
literature	1.00	1.00	1.00	45
mathematics	0.98	1.00	0.99	46
medicine	1.00	1.00	1.00	45
multi_domain	1.00	0.55	0.71	22
science	1.00	1.00	1.00	45
sensitive	1.00	1.00	1.00	45
technology	1.00	1.00	1.00	44

Comparison with Baselines

Model	Accuracy	Notes
This model (25 epochs)	98.26%	✅ Optimal
Original 25 epochs	97.97%	Good baseline
30 epochs attempt	94.64%	Overfitted ❌
50 epochs attempt	~85-90%	Severe overfitting ❌

🎯 Use Cases

1. Content Routing

Route user queries to appropriate specialists or systems:

query = "How do I treat a sprained ankle?"
domain = classify_domain(query)["primary_domain"]
# → "medicine" → Route to medical expert

2. Support Ticket Classification

Automatically categorize support tickets:

ticket = "Our API returns 401 errors"
domain = classify_domain(ticket)["primary_domain"]
# → "api_generation" → Route to API team

3. Content Moderation

Identify sensitive content requiring review:

post = "Discussion about controversial topic"
result = classify_domain(post)
if result["primary_domain"] == "sensitive":
    # Flag for manual review
    pass

4. Search & Discovery

Improve search by understanding query intent:

search_query = "best sorting algorithms"
domain = classify_domain(search_query)["primary_domain"]
# → "coding" → Show programming results

🔍 Model Behavior

Strengths

✅ 9 perfect domains (100% precision & recall)
✅ High precision across all domains (93-100%)
✅ Consistent performance on similar queries
✅ Fast inference with LoRA
✅ Low memory footprint (~200MB adapters)

Limitations

⚠️ Multi-domain classification is challenging (71% F1)
- Queries spanning multiple domains are harder to classify
- Model tends to pick a single primary domain
⚠️ Requires exact domain list - cannot handle new domains without retraining
⚠️ English only - trained on English text

Recommendations

For multi-domain queries, consider using ensemble or multi-label classification
Validate outputs in production with confidence thresholds
Monitor edge cases and collect feedback for model improvements

📁 Repository Contents

adapter_config.json - LoRA configuration
adapter_model.safetensors - Fine-tuned LoRA weights
tokenizer files - Tokenizer configuration
test_results.json - Comprehensive evaluation metrics
training_curves.png - Training/validation loss curves
confusion_matrix.png - Per-domain performance visualization
final_dataset_*.csv - Training/validation/test datasets

🔄 Reproducibility

This model can be reproduced using the exact configuration above. Key factors:

Seed: 42 (for reproducibility)
No data augmentation (clean training)
Exact hyperparameters documented
Best checkpoint selection (not last)

📊 Training History

The model was trained with several attempts to optimize performance:

Original run: 97.97% accuracy (25 epochs)
30 epochs attempt: 94.64% - overfitted due to data augmentation
50 epochs attempt: ~85-90% - severe overfitting
Final reproduction: 98.26% - optimal configuration ✅

Key insight: 25 epochs is the sweet spot for this task and dataset.

⚖️ License & Citation

License

This model is released under MIT License. The base Phi-3 model has its own license from Microsoft.

Citation

If you use this model, please cite:

@model{phi3-domain-classifier-98,
  author = {ovinduG},
  title = {Phi-3 Domain Classification Model - 98.26% Accuracy},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ovinduG/phi3-domain-classifier-98.26}}
}

Also cite the original Phi-3 paper:

@article{phi3,
  title={Phi-3 Technical Report},
  author={Microsoft},
  year={2024}
}

🤝 Contributing

Found an issue or have suggestions? Please open an issue on the model repository.

📞 Contact

Author: ovinduG
Repository: https://huggingface.co/ovinduG/phi3-domain-classifier-98.26
Upload Date: 2025-12-16

🙏 Acknowledgments

Microsoft for the Phi-3 base model
Hugging Face for the transformers library
PEFT library for LoRA implementation

Model Status: ✅ Production-Ready | Accuracy: 98.26% | Perfect Domains: 9/16

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ovinduG/phi3-domain-classifier-98.26

Base model

microsoft/Phi-3-mini-4k-instruct

Adapter

(800)

this model

Space using ovinduG/phi3-domain-classifier-98.26 1

Evaluation results

Accuracy
self-reported

0.983
F1 (Weighted)
self-reported

0.981