Urdu Turn Detection Model (XLM-RoBERTa-Base)

This model is a fine-tuned version of xlm-roberta-base for Urdu Turn Detection (End-of-Turn). It serves as a high-accuracy teacher model for smaller distilled variants (like DistilBERT) or as a robust standalone model for server side inference.

This model detects End-of-Turn (EoT) in Urdu speech transcripts, classifying each sentence as:

Complete → The speaker has finished their turn
Incomplete → The speaker is pausing / trailing off / not yet done

While this may appear similar to Voice Activity Detection (VAD), it solves a different and more linguistically complex problem:

🔍 VAD vs. Turn Detection (Key Difference)

VAD only detects whether sound is present or absent — i.e.,
➡️ “Is the speaker currently making noise or not?”

It cannot determine whether a sentence is logically complete, because it relies only on raw audio features (energy, frequency, silence gaps).

Turn Detection, however, is a semantic task:
➡️ “Has the speaker finished their thought, or are they about to continue?”

This requires understanding grammar, syntax, pause structures, and sentence completeness — something VAD does not and cannot evaluate.

Example:

Utterance	VAD Output	Turn Detection Output
"اگر تم وقت پر آتے تو..."	Speech present	Incomplete (thought not finished)
"میں گھر جا رہا ہوں۔"	Speech present	Complete
1 sec silence	Silence	Not applicable

Thus, this model complements VAD:

VAD → detects audio boundaries
Turn Detection → detects linguistic boundaries

It is finetuned on a 10,000 sample Urdu dataset and optimized for real-time deployment in conversational AI, ASR pipelines, and voice assistants.

Model Details

Base Model: xlm-roberta-base
Task: Binary Text Classification (Complete vs. Incomplete)
Language: Urdu (اردو)
Dataset: 10,000 samples (Real-world + High-quality Synthetic)

Performance

Training Metrics

Metric	Value
Accuracy	~97.5%
F1 Score	~97.5%
Precision	~97%
Recall	~98%

Technical Specifications

Training Configuration

Parameter	Value
Base Model	`xlm-roberta-base`
Fine-tuning Method	Full Fine-tuning
Dataset	Urdu Turn Detection (~10,000 examples)
Learning Rate	2e-5
Batch Size	16 (8 per device, grad accum 2)
Optimizer	AdamW
Max Sequence Length	64 tokens
Epochs	3
Floating Point	FP16 (Mixed Precision)

Trainable Parameters

Total Parameters: ~278M
Model Size: ~1.1 GB

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "PuristanLabs1/xlm-roberta-urdu-turn-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "میں گھر جا رہا ہوں" # Complete
# text = "اگر میں وہاں ہوتا..." # Incomplete

inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    label = torch.argmax(logits, dim=-1).item()

print("Complete" if label == 1 else "Incomplete")

Downloads last month: 21

Safetensors

Model size

0.3B params

Tensor type

F32