Arabic End-of-Utterance (EOU) Detector
Detect when a speaker has finished their utterance in Arabic conversations.
This model is fine-tuned from AraBERT v2 for binary classification of Arabic text to determine if an utterance is complete (EOU) or incomplete (No EOU).
Model Description
- Model Type: BERT-based binary classifier
- Base Model: aubmindlab/bert-base-arabertv2
- Language: Arabic (ar)
- Task: End-of-Utterance Detection
- License: Apache 2.0
Performance
| Metric | Value |
|---|---|
| Accuracy | 90% |
| Precision (EOU) | 0.90 |
| Recall (EOU) | 0.93 |
| F1-Score (EOU) | 0.92 |
| Test Samples | 1,001 |
Confusion Matrix
Predicted
No EOU EOU
Actual No 333 62 (84.3% correct)
EOU 42 564 (93.1% correct)
Available Formats
This repository includes three model formats:
- PyTorch (
pytorch_model.binormodel.safetensors) - For training and fine-tuning - ONNX (
model.onnx) - For optimized CPU/GPU inference (~2-3x faster) - Quantized ONNX (
model_quantized.onnx) - For production (75% smaller, 2-3x faster)
Quick Start
Installation
pip install transformers torch onnxruntime
PyTorch Inference
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "your-username/arabic-eou-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Inference
def predict_eou(text: str):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.softmax(logits, dim=-1)
is_eou = torch.argmax(probs, dim=-1).item() == 1
confidence = probs[0, 1].item()
return is_eou, confidence
# Test
text = "ู
ุฑุญุจุง ููู ุญุงูู"
is_eou, conf = predict_eou(text)
print(f"Is EOU: {is_eou}, Confidence: {conf:.4f}")
ONNX Inference (Recommended for Production)
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
# Load model and tokenizer
model_name = "your-username/arabic-eou-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load ONNX model (use model_quantized.onnx for best performance)
session = ort.InferenceSession(
"model_quantized.onnx", # or "model.onnx"
providers=['CPUExecutionProvider']
)
# Inference
def predict_eou(text: str):
inputs = tokenizer(
text,
padding="max_length",
max_length=512,
truncation=True,
return_tensors="np"
)
outputs = session.run(
None,
{
'input_ids': inputs['input_ids'].astype(np.int64),
'attention_mask': inputs['attention_mask'].astype(np.int64)
}
)
logits = outputs[0]
probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
is_eou = np.argmax(probs, axis=-1)[0] == 1
confidence = float(probs[0, 1])
return is_eou, confidence
# Test
text = "ู
ุฑุญุจุง ููู ุญุงูู"
is_eou, conf = predict_eou(text)
print(f"Is EOU: {is_eou}, Confidence: {conf:.4f}")
Use Cases
- Voice Assistants: Detect when user has finished speaking
- Conversational AI: Improve turn-taking in Arabic chatbots
- LiveKit Agents: Custom turn detection for Arabic conversations
- Speech Recognition: Post-processing for better utterance segmentation
Integration with LiveKit
from livekit.plugins.arabic_turn_detector import ArabicTurnDetector
# Download model from HuggingFace
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="your-username/arabic-eou-detector",
filename="model_quantized.onnx"
)
# Create turn detector
turn_detector = ArabicTurnDetector(
model_path=model_path,
unlikely_threshold=0.7
)
# Use in agent
session = AgentSession(
turn_detector=turn_detector,
# ... other config
)
Training Details
Training Data
- Dataset: Arabic EOU Detection (10,072 samples)
- Train/Val/Test Split: 80/10/10
- Classes:
0: Incomplete utterance (No EOU)1: Complete utterance (EOU)
Training Hyperparameters
- Base Model: aubmindlab/bert-base-arabertv2
- Learning Rate: 2e-5
- Batch Size: 32
- Epochs: 10
- Optimizer: AdamW
- Weight Decay: 0.01
- Max Sequence Length: 512
Preprocessing
- AraBERT normalization (diacritics removal, character normalization)
- Tokenization with AraBERT tokenizer
- Padding to max length (512 tokens)
Limitations
- Language: Optimized for Modern Standard Arabic (MSA)
- Domain: Trained on conversational Arabic text
- Sequence Length: Maximum 512 tokens
- Dialects: May have reduced accuracy on dialectal Arabic
Citation
If you use this model, please cite:
@misc{arabic-eou-detector,
author = {Your Name},
title = {Arabic End-of-Utterance Detector},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/your-username/arabic-eou-detector}}
}
License
Apache 2.0
Acknowledgments
- AraBERT: aubmindlab/bert-base-arabertv2
- HuggingFace Transformers: Model training and inference
- ONNX Runtime: Model optimization and deployment
Contact
For issues or questions, please open an issue on the GitHub repository.
- Downloads last month
- 39
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Evaluation results
- Accuracy on Arabic EOU Detectionself-reported0.900
- F1 Score (EOU) on Arabic EOU Detectionself-reported0.920
- Precision (EOU) on Arabic EOU Detectionself-reported0.900
- Recall (EOU) on Arabic EOU Detectionself-reported0.930