Urdu Turn Detection Model (XLM-RoBERTa-Base)

Model Dataset License

This model is a fine-tuned version of xlm-roberta-base for Urdu Turn Detection (End-of-Turn). It serves as a high-accuracy teacher model for smaller distilled variants (like DistilBERT) or as a robust standalone model for server side inference.

This model detects End-of-Turn (EoT) in Urdu speech transcripts, classifying each sentence as:

  • Complete → The speaker has finished their turn
  • Incomplete → The speaker is pausing / trailing off / not yet done

While this may appear similar to Voice Activity Detection (VAD), it solves a different and more linguistically complex problem:

🔍 VAD vs. Turn Detection (Key Difference)

VAD only detects whether sound is present or absent — i.e.,
➡️ “Is the speaker currently making noise or not?”

It cannot determine whether a sentence is logically complete, because it relies only on raw audio features (energy, frequency, silence gaps).

Turn Detection, however, is a semantic task:
➡️ “Has the speaker finished their thought, or are they about to continue?”

This requires understanding grammar, syntax, pause structures, and sentence completeness — something VAD does not and cannot evaluate.

Example:

Utterance VAD Output Turn Detection Output
"اگر تم وقت پر آتے تو..." Speech present Incomplete (thought not finished)
"میں گھر جا رہا ہوں۔" Speech present Complete
1 sec silence Silence Not applicable

Thus, this model complements VAD:

  • VAD → detects audio boundaries
  • Turn Detection → detects linguistic boundaries

It is finetuned on a 10,000 sample Urdu dataset and optimized for real-time deployment in conversational AI, ASR pipelines, and voice assistants.


Model Details

  • Base Model: xlm-roberta-base
  • Task: Binary Text Classification (Complete vs. Incomplete)
  • Language: Urdu (اردو)
  • Dataset: 10,000 samples (Real-world + High-quality Synthetic)

Performance

Training Metrics

Metric Value
Accuracy ~97.5%
F1 Score ~97.5%
Precision ~97%
Recall ~98%

Technical Specifications

Training Configuration

Parameter Value
Base Model xlm-roberta-base
Fine-tuning Method Full Fine-tuning
Dataset Urdu Turn Detection (~10,000 examples)
Learning Rate 2e-5
Batch Size 16 (8 per device, grad accum 2)
Optimizer AdamW
Max Sequence Length 64 tokens
Epochs 3
Floating Point FP16 (Mixed Precision)

Trainable Parameters

  • Total Parameters: ~278M
  • Model Size: ~1.1 GB

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "PuristanLabs1/xlm-roberta-urdu-turn-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "میں گھر جا رہا ہوں" # Complete
# text = "اگر میں وہاں ہوتا..." # Incomplete

inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    label = torch.argmax(logits, dim=-1).item()

print("Complete" if label == 1 else "Incomplete")
Downloads last month
21
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support