SmolLM2-135M Arabic End-of-Utterance Detector
Fine-tuned SmolLM2-135M model for detecting end-of-utterance (EOU) in Arabic conversations.
Model Description
This model predicts when an Arabic speaker has finished their turn in a conversation based on transcribed speech. It's designed for real-time voice assistants, LiveKit agents, and conversational AI systems.
Key Features:
- 🎯 High Accuracy: F1-Score of 0.913
- 🌍 Multi-Dialect: Supports Levantine, Egyptian, and Gulf Arabic
- ⚡ Fast Inference: <50ms per prediction on GPU
- 🔄 Context-Aware: Can use previous utterances for better predictions
- 🎙️ Production-Ready: Integrated with LiveKit for real-time use
Performance
| Metric | Score |
|---|---|
| F1 Score | 0.913 |
| Accuracy | 0.913 |
| Precision | 0.906 |
| Recall | 0.921 |
| AUC-ROC | 0.958 |
Inference Speed:
- CPU: 30-50ms per prediction
- GPU (RTX 4070): 10-20ms per prediction
- Batch (32 samples): 3-6ms per prediction
Training Details
Training Data
- Dataset: Reverb/arabic-eou-conversations
- Total Examples: 11,660 (balanced 50/50 EOU/NOT_EOU)
- Dialects:
- Levantine (شامي)
- Egyptian (مصري)
- Gulf (خليجي)
- Split: 80% train, 10% validation, 10% test
Training Configuration
- Base Model: HuggingFaceTB/SmolLM2-135M
- Parameters: 135 million
- Hardware: NVIDIA RTX 4070 (8GB VRAM)
- Batch Size: 32 (effective: 64 with gradient accumulation)
- Learning Rate: 2e-5
- Epochs: 5
- Optimizer: AdamW
- Mixed Precision: FP16
Usage
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
"Reverb/smollm2-135m-arabic-eou"
)
tokenizer = AutoTokenizer.from_pretrained(
"Reverb/smollm2-135m-arabic-eou"
)
# Predict
text = "شو رأيك نروح نتغدا؟"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
prediction = torch.argmax(probs, dim=-1).item()
confidence = probs[0][prediction].item()
print(f"EOU: {prediction == 1}, Confidence: {confidence:.3f}")
# Output: EOU: True, Confidence: 0.952
With Context
# Using previous utterance as context
context = "كيف حالك؟"
current = "الحمد لله بخير"
text_with_context = f"{context} [SEP] {current}"
inputs = tokenizer(text_with_context, return_tensors="pt", max_length=256, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
is_eou = torch.argmax(probs, dim=-1).item() == 1
confidence = probs[0][1 if is_eou else 0].item()
print(f"EOU: {is_eou}, Confidence: {confidence:.3f}")
Batch Prediction
texts = [
"شو رأيك", # Partial - NOT_EOU
"شو رأيك نروح نتغدا؟" # Complete - EOU
]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predictions = torch.argmax(probs, dim=-1)
for text, pred, prob in zip(texts, predictions, probs):
is_eou = pred.item() == 1
conf = prob[pred].item()
print(f"'{text}' → {'EOU' if is_eou else 'NOT_EOU'} ({conf:.3f})")
Intended Use
Primary Use Cases
- Voice Assistants: Detect when users finish speaking
- LiveKit Agents: Real-time turn detection in voice conversations
- Dialogue Systems: Turn-taking in conversational AI
- Transcription Systems: Add turn boundaries to speech transcripts
- Conversation Analysis: Analyze turn-taking patterns
Example Applications
Real-time Voice Agent
# Process STT transcription is_eou, confidence = detect_eou(transcription) if is_eou and confidence > 0.7: # User finished speaking, generate response agent_response = generate_response(transcription)LiveKit Integration
from livekit_eou_sdk import ArabicEOUTurnDetector detector = ArabicEOUTurnDetector(threshold=0.7) is_eou, conf = await detector.process_transcription(text, is_final=True)
Limitations
- Dialect Coverage: Optimized for Levantine, Egyptian, and Gulf dialects. May not perform as well on other Arabic dialects.
- Formal Arabic: Designed for conversational/colloquial Arabic. Performance on Modern Standard Arabic (MSA) or Classical Arabic may vary.
- Domain: Trained on general conversational data. May require fine-tuning for specialized domains (medical, legal, etc.).
- Context: Best results when using conversation context. Single utterances without context may have lower accuracy.
- Spoken Language: Designed for transcribed spoken language, not written text.
Bias and Fairness
- The model was trained on balanced data across three major Arabic dialects
- Performance is consistent across all three dialects (Levantine, Egyptian, Gulf)
- May have reduced performance on underrepresented dialects or regional variations
- No demographic or gender-based biases were intentionally introduced
Model Architecture
- Type: Sequence Classification (Binary)
- Base: LlamaForSequenceClassification (SmolLM2-135M)
- Input: Arabic text (max 256 tokens)
- Output: Binary classification (0=NOT_EOU, 1=EOU)
- Classes: 2 (NOT_EOU, EOU)
- Model Size: ~270MB
Citation
@misc{arabic-eou-detector-2025,
author = {Reverb},
title = {Arabic End-of-Utterance Detector},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Reverb/smollm2-135m-arabic-eou}},
note = {Fine-tuned SmolLM2-135M for Arabic EOU detection}
}
License
MIT License
Acknowledgments
- Base Model: SmolLM2-135M by Hugging Face
- Framework: PyTorch, Transformers
- Dataset: Arabic EOU Conversations
Contact
For questions or issues, please open an issue on the model repository.
Related Resources
- Dataset: Reverb/arabic-eou-conversations
- Code Repository: Available in model files
- LiveKit SDK: Included for real-time integration
Model Card Version: 1.0
Last Updated: December 2025
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Reverb/smollm2-135m-arabic-eou
Base model
HuggingFaceTB/SmolLM2-135MDataset used to train Reverb/smollm2-135m-arabic-eou
Evaluation results
- F1 Score on Arabic EOU Conversationsself-reported0.913
- Accuracy on Arabic EOU Conversationsself-reported0.913
- Precision on Arabic EOU Conversationsself-reported0.906
- Recall on Arabic EOU Conversationsself-reported0.921
- AUC-ROC on Arabic EOU Conversationsself-reported0.958