π Phi-3 Domain Classification Model - 98.26% Accuracy
Fine-tuned Phi-3-mini-4k-instruct achieving 98.26% accuracy on domain classification across 16 domains.
π― Model Performance
- Test Accuracy: 98.26%
- F1 (Macro): 97.18%
- F1 (Weighted): 98.07%
- Perfect Domains: 9/16 (100% precision & recall)
- Near-Perfect Domains: 15/16 (>95% F1)
π Performance Metrics
| Metric | Value |
|---|---|
| Accuracy | 98.26% |
| F1 (Macro) | 97.18% |
| F1 (Weighted) | 98.07% |
| Training Time | ~3-4 hours |
| Perfect Domains | 9/16 |
π¨ Supported Domains
The model classifies text into 16 domains:
- coding - Programming and software development (98% F1)
- api_generation - API design and implementation (98% F1)
- mathematics - Mathematical problems and concepts (99% F1)
- data_analysis - Data science and analytics (96% F1)
- science - Scientific queries (100% F1) β
- medicine - Medical and healthcare topics (100% F1) β
- business - Business and commerce (97% F1)
- law - Legal matters (100% F1) β
- technology - Tech industry and products (100% F1) β
- literature - Books, writing, poetry (100% F1) β
- creative_content - Art, music, creative work (100% F1) β
- education - Learning and teaching (100% F1) β
- general_knowledge - General information (97% F1)
- ambiguous - Unclear or multi-interpretation queries (100% F1) β
- sensitive - Sensitive topics requiring care (100% F1) β
- multi_domain - Cross-domain queries (71% F1)
β = Perfect classification (100% precision & recall)
π§ Training Configuration
Model Architecture
- Base Model: microsoft/Phi-3-mini-4k-instruct (3.82B parameters)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 32
- LoRA Alpha: 64
- Target Modules: qkv_proj, o_proj, gate_up_proj, down_proj
Training Hyperparameters
- Epochs: 25 (proven optimal)
- Learning Rate: 2e-4
- LR Scheduler: Cosine
- Warmup Ratio: 10%
- Batch Size: 32 (effective)
- Label Smoothing: 0.1
- Precision: BF16
Training Strategy
- β Clean dataset (no data augmentation)
- β Standard cosine schedule
- β Best checkpoint loading
- β Gradient checkpointing
- β Reproducible (seed=42)
π Quick Start
Installation
pip install transformers peft torch
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
import json
# Load model
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
model = PeftModel.from_pretrained(
base_model,
"ovinduG/phi3-domain-classifier-98.26"
)
tokenizer = AutoTokenizer.from_pretrained(
"ovinduG/phi3-domain-classifier-98.26",
trust_remote_code=True
)
# Classification function
def classify_domain(text):
messages = [
{
"role": "system",
"content": "You are a domain classifier. Respond with JSON."
},
{
"role": "user",
"content": f"Classify: {text}"
}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True
)
# Parse JSON response
try:
response_clean = response.strip()
if '```' in response_clean:
response_clean = response_clean.split('```')[1]
if response_clean.startswith('json'):
response_clean = response_clean[4:]
return json.loads(response_clean.strip())
except:
return {"primary_domain": "unknown", "confidence": "low"}
# Example usage
result = classify_domain("Write a Python function to sort a list")
print(result)
# Output: {"primary_domain": "coding", "confidence": "high"}
result = classify_domain("What are the symptoms of diabetes?")
print(result)
# Output: {"primary_domain": "medicine", "confidence": "high"}
result = classify_domain("Explain quantum entanglement")
print(result)
# Output: {"primary_domain": "science", "confidence": "high"}
Batch Classification
def classify_batch(texts, batch_size=8):
# Classify multiple texts efficiently
results = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
for text in batch:
results.append(classify_domain(text))
return results
# Example
texts = [
"How to implement OAuth2?",
"Best practices for diabetes management",
"Write a sorting algorithm in Python"
]
results = classify_batch(texts)
for text, result in zip(texts, results):
print(f"{text[:50]:50s} β {result['primary_domain']}")
π Performance Details
Per-Domain Results
| Domain | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| ambiguous | 1.00 | 1.00 | 1.00 | 45 |
| api_generation | 0.96 | 1.00 | 0.98 | 45 |
| business | 0.96 | 0.98 | 0.97 | 44 |
| coding | 0.95 | 1.00 | 0.98 | 42 |
| creative_content | 1.00 | 1.00 | 1.00 | 45 |
| data_analysis | 0.93 | 1.00 | 0.96 | 41 |
| education | 1.00 | 1.00 | 1.00 | 45 |
| general_knowledge | 0.96 | 0.98 | 0.97 | 45 |
| law | 1.00 | 1.00 | 1.00 | 46 |
| literature | 1.00 | 1.00 | 1.00 | 45 |
| mathematics | 0.98 | 1.00 | 0.99 | 46 |
| medicine | 1.00 | 1.00 | 1.00 | 45 |
| multi_domain | 1.00 | 0.55 | 0.71 | 22 |
| science | 1.00 | 1.00 | 1.00 | 45 |
| sensitive | 1.00 | 1.00 | 1.00 | 45 |
| technology | 1.00 | 1.00 | 1.00 | 44 |
Comparison with Baselines
| Model | Accuracy | Notes |
|---|---|---|
| This model (25 epochs) | 98.26% | β Optimal |
| Original 25 epochs | 97.97% | Good baseline |
| 30 epochs attempt | 94.64% | Overfitted β |
| 50 epochs attempt | ~85-90% | Severe overfitting β |
π― Use Cases
1. Content Routing
Route user queries to appropriate specialists or systems:
query = "How do I treat a sprained ankle?"
domain = classify_domain(query)["primary_domain"]
# β "medicine" β Route to medical expert
2. Support Ticket Classification
Automatically categorize support tickets:
ticket = "Our API returns 401 errors"
domain = classify_domain(ticket)["primary_domain"]
# β "api_generation" β Route to API team
3. Content Moderation
Identify sensitive content requiring review:
post = "Discussion about controversial topic"
result = classify_domain(post)
if result["primary_domain"] == "sensitive":
# Flag for manual review
pass
4. Search & Discovery
Improve search by understanding query intent:
search_query = "best sorting algorithms"
domain = classify_domain(search_query)["primary_domain"]
# β "coding" β Show programming results
π Model Behavior
Strengths
- β 9 perfect domains (100% precision & recall)
- β High precision across all domains (93-100%)
- β Consistent performance on similar queries
- β Fast inference with LoRA
- β Low memory footprint (~200MB adapters)
Limitations
- β οΈ Multi-domain classification is challenging (71% F1)
- Queries spanning multiple domains are harder to classify
- Model tends to pick a single primary domain
- β οΈ Requires exact domain list - cannot handle new domains without retraining
- β οΈ English only - trained on English text
Recommendations
- For multi-domain queries, consider using ensemble or multi-label classification
- Validate outputs in production with confidence thresholds
- Monitor edge cases and collect feedback for model improvements
π Repository Contents
adapter_config.json- LoRA configurationadapter_model.safetensors- Fine-tuned LoRA weightstokenizer files- Tokenizer configurationtest_results.json- Comprehensive evaluation metricstraining_curves.png- Training/validation loss curvesconfusion_matrix.png- Per-domain performance visualizationfinal_dataset_*.csv- Training/validation/test datasets
π Reproducibility
This model can be reproduced using the exact configuration above. Key factors:
- Seed: 42 (for reproducibility)
- No data augmentation (clean training)
- Exact hyperparameters documented
- Best checkpoint selection (not last)
π Training History
The model was trained with several attempts to optimize performance:
- Original run: 97.97% accuracy (25 epochs)
- 30 epochs attempt: 94.64% - overfitted due to data augmentation
- 50 epochs attempt: ~85-90% - severe overfitting
- Final reproduction: 98.26% - optimal configuration β
Key insight: 25 epochs is the sweet spot for this task and dataset.
βοΈ License & Citation
License
This model is released under MIT License. The base Phi-3 model has its own license from Microsoft.
Citation
If you use this model, please cite:
@model{phi3-domain-classifier-98,
author = {ovinduG},
title = {Phi-3 Domain Classification Model - 98.26% Accuracy},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ovinduG/phi3-domain-classifier-98.26}}
}
Also cite the original Phi-3 paper:
@article{phi3,
title={Phi-3 Technical Report},
author={Microsoft},
year={2024}
}
π€ Contributing
Found an issue or have suggestions? Please open an issue on the model repository.
π Contact
- Author: ovinduG
- Repository: https://huggingface.co/ovinduG/phi3-domain-classifier-98.26
- Upload Date: 2025-12-16
π Acknowledgments
- Microsoft for the Phi-3 base model
- Hugging Face for the transformers library
- PEFT library for LoRA implementation
Model Status: β Production-Ready | Accuracy: 98.26% | Perfect Domains: 9/16
Model tree for ovinduG/phi3-domain-classifier-98.26
Base model
microsoft/Phi-3-mini-4k-instructSpace using ovinduG/phi3-domain-classifier-98.26 1
Evaluation results
- Accuracyself-reported0.983
- F1 (Weighted)self-reported0.981