Urdu Turn Detection Model (XLM-RoBERTa-Base)
This model is a fine-tuned version of xlm-roberta-base for Urdu Turn Detection (End-of-Turn). It serves as a high-accuracy teacher model for smaller distilled variants (like DistilBERT) or as a robust standalone model for server side inference.
This model detects End-of-Turn (EoT) in Urdu speech transcripts, classifying each sentence as:
- Complete → The speaker has finished their turn
- Incomplete → The speaker is pausing / trailing off / not yet done
While this may appear similar to Voice Activity Detection (VAD), it solves a different and more linguistically complex problem:
🔍 VAD vs. Turn Detection (Key Difference)
VAD only detects whether sound is present or absent — i.e.,
➡️ “Is the speaker currently making noise or not?”
It cannot determine whether a sentence is logically complete, because it relies only on raw audio features (energy, frequency, silence gaps).
Turn Detection, however, is a semantic task:
➡️ “Has the speaker finished their thought, or are they about to continue?”
This requires understanding grammar, syntax, pause structures, and sentence completeness — something VAD does not and cannot evaluate.
Example:
| Utterance | VAD Output | Turn Detection Output |
|---|---|---|
| "اگر تم وقت پر آتے تو..." | Speech present | Incomplete (thought not finished) |
| "میں گھر جا رہا ہوں۔" | Speech present | Complete |
| 1 sec silence | Silence | Not applicable |
Thus, this model complements VAD:
- VAD → detects audio boundaries
- Turn Detection → detects linguistic boundaries
It is finetuned on a 10,000 sample Urdu dataset and optimized for real-time deployment in conversational AI, ASR pipelines, and voice assistants.
Model Details
- Base Model:
xlm-roberta-base - Task: Binary Text Classification (Complete vs. Incomplete)
- Language: Urdu (اردو)
- Dataset: 10,000 samples (Real-world + High-quality Synthetic)
Performance
Training Metrics
| Metric | Value |
|---|---|
| Accuracy | ~97.5% |
| F1 Score | ~97.5% |
| Precision | ~97% |
| Recall | ~98% |
Technical Specifications
Training Configuration
| Parameter | Value |
|---|---|
| Base Model | xlm-roberta-base |
| Fine-tuning Method | Full Fine-tuning |
| Dataset | Urdu Turn Detection (~10,000 examples) |
| Learning Rate | 2e-5 |
| Batch Size | 16 (8 per device, grad accum 2) |
| Optimizer | AdamW |
| Max Sequence Length | 64 tokens |
| Epochs | 3 |
| Floating Point | FP16 (Mixed Precision) |
Trainable Parameters
- Total Parameters: ~278M
- Model Size: ~1.1 GB
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "PuristanLabs1/xlm-roberta-urdu-turn-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "میں گھر جا رہا ہوں" # Complete
# text = "اگر میں وہاں ہوتا..." # Incomplete
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
label = torch.argmax(logits, dim=-1).item()
print("Complete" if label == 1 else "Incomplete")
- Downloads last month
- 21