🏷️ Urdu Turn Detection Model (DistilBERT)

This model detects End-of-Turn (EoT) in Urdu speech transcripts, classifying each sentence as:

Complete → The speaker has finished their turn
Incomplete → The speaker is pausing / trailing off / not yet done

While this may appear similar to Voice Activity Detection (VAD), it solves a different and more linguistically complex problem:

🔍 VAD vs. Turn Detection (Key Difference)

VAD only detects whether sound is present or absent — i.e.,
➡️ “Is the speaker currently making noise or not?”

It cannot determine whether a sentence is logically complete, because it relies only on raw audio features (energy, frequency, silence gaps).

Turn Detection, however, is a semantic task:
➡️ “Has the speaker finished their thought, or are they about to continue?”

This requires understanding grammar, syntax, pause structures, and sentence completeness — something VAD does not and cannot evaluate.

Example:

Utterance	VAD Output	Turn Detection Output
"اگر تم وقت پر آتے تو..."	Speech present	Incomplete (thought not finished)
"میں گھر جا رہا ہوں۔"	Speech present	Complete
1 sec silence	Silence	Not applicable

Thus, this model complements VAD:

VAD → detects audio boundaries
Turn Detection → detects linguistic boundaries

It is finetuned on a 10,000 sample Urdu dataset and optimized for real-time deployment in conversational AI, ASR pipelines, and voice assistants.

🚀 Model Variants

Variant	Description	Size	Latency (CPU)	F1 Score
Base	Fine-tuned `distilbert-base-multilingual-cased`	~516 MB	~40 ms	97.2%
Quantized	Dynamic INT8 Quantization	~393 MB	~14 ms	95.9%

📊 Performance

Evaluation on a held-out test set of 1,000 Urdu samples:

Metric	Base Model	Quantized (INT8)
Accuracy	97.2%	95.9%
F1 Score	97.2%	95.9%
Precision	96.9%	96.8%
Recall	97.6%	95.1%

📚 Dataset Details

Total Samples: 10,000
Label Balance: 50% Complete, 50% Incomplete
Sources:
- 2,825 validated real-world Urdu EoT samples
- 7,175 synthetic Urdu samples
Script: 100% Urdu (Nastaliq/Arabic), no Roman Urdu
Domains: everyday conversational patterns, trailing constructs, pauses

Technical Specifications

Training Configuration

Parameter	Value
Base Model	`distilbert-base-multilingual-cased`
Fine-tuning Method	Full Fine-tuning
Dataset	Urdu Turn Detection (~10,000 examples)
Language	Urdu (اردو)
Learning Rate	2e-5
Scheduler	Linear Decay
Batch Size	128 per device
Gradient Accumulation	1 step
Max Sequence Length	128 tokens
Optimizer	AdamW
Training Epochs	5
Total Steps	~315 steps (63 steps/epoch)
Floating Point	FP16 (Mixed Precision)

Dataset Statistics

Feature	Details
Total Examples	10,000
Train Set	8,000 examples (80%)
Validation Set	1,000 examples (10%)
Test Set	1,000 examples (10%)
Balance	50% Complete / 50% Incomplete
Source	Real-world validation (28%) + Quality Synthetic (72%)

Training Metrics (Approximated)

Metric	Value
Final Test Loss	~0.08
Validation F1	97.2%
Test F1	97.2% (Base) / 95.9% (Quantized)

Trainable Parameters

Total Parameters: ~135,000,000
Trainable Parameters: ~135,000,000 (100%)
Model Size: ~516 MB (FP32) -> ~393 MB (INT8)

🔧 Usage

1️⃣ Using the Base Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "PuristanLabs1/urdu-turn-detection-distilbert" 

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "میں گھر جا رہا ہوں"  # Complete
# text = "اگر تم وقت پر آتے تو"  # Incomplete

inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    score = probs[0][1].item()

label = "Complete" if score > 0.5 else "Incomplete"
print(f"Prediction: {label} ({score:.2f})")

2️⃣ Using the Quantized Model (Fast Inference)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "PuristanLabs1/urdu-turn-detection-distilbert-quantized" 

# Load architecture + tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Apply dynamic quantization
model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Load quantized weights
state_dict = torch.hub.load_state_dict_from_url(
    f"https://huggingface.co/PuristanLabs1/urdu-turn-detection-distilbert-quantized/resolve/main/quantized_model.pt",
    map_location="cpu"
)
model.load_state_dict(state_dict)

# Inference same as base model

🎯 Intended Use Cases

Voice Assistants
Dialogue Systems
Real-time ASR segmentation
Call Center AI / IVR turn-taking models

⚠️ Limitations

No acoustic/prosodic features (text-only model)
Short ambiguous utterances may require context
Should not be used alone in safety-critical systems

📝 License

MIT License

Downloads last month: 132

Safetensors

Model size

0.1B params

Tensor type

F32