Arabic End-of-Utterance Detection Model

Model Description

This model detects End-of-Utterance (EOU) in Arabic conversations, specifically optimized for Saudi dialects. It predicts the probability that a speaker has finished their conversational turn based on text transcription.

Use Case: Real-time conversational AI agents (voice assistants, chatbots, customer service)

Performance

Metric Score
Test Accuracy 99.6%
Precision 100%
Recall 99.45%
F1 Score 99.73%
AUC-ROC 99.96%
Inference Time ~15-20ms

Training Data

  • Total samples: 5,000
  • SADA22 (Real Saudi audio): 104 samples (2.1%)
  • Synthetic (Saudi patterns): 4,896 samples (97.9%)
  • Splits: 80% train / 10% validation / 10% test

Quick Start

Installation

pip install transformers torch

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model")
tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model")
model.eval()

# Predict EOU
text = "ู…ุฑุญุจุง ูƒูŠู ุญุงู„ูƒ ุงู„ูŠูˆู…"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    eou_probability = probs[0][1].item()

print(f"EOU Probability: {eou_probability:.2%}")
# Output: EOU Probability: 98.56%

Integration with LiveKit

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

class EOUDetector:
    def __init__(self, threshold=0.7):
        self.model = AutoModelForSequenceClassification.from_pretrained("HossamEL-Dein/arabic-eou-model")
        self.tokenizer = AutoTokenizer.from_pretrained("HossamEL-Dein/arabic-eou-model")
        self.model.eval()
        self.threshold = threshold
    
    def check_eou(self, transcript_text):
        inputs = self.tokenizer(transcript_text, return_tensors="pt")
        with torch.no_grad():
            outputs = self.model(**inputs)
            probs = torch.softmax(outputs.logits, dim=-1)
            eou_prob = probs[0][1].item()
        
        return {
            'probability': eou_prob,
            'is_eou': eou_prob > self.threshold
        }

# Use in LiveKit agent
detector = EOUDetector()
result = detector.check_eou("ู…ุฑุญุจุง ูƒูŠู ุญุงู„ูƒ")
if result['is_eou']:
    print("User finished speaking!")

Model Architecture

  • Base Model: aubmindlab/bert-base-arabertv02
  • Task: Binary sequence classification
  • Input: Arabic text (up to 128 tokens)
  • Output: 2-class probability distribution [Non-EOU, EOU]
  • Parameters: 136M

Training Details

  • Framework: PyTorch + Transformers
  • Epochs: 3
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Optimizer: AdamW
  • Training Time: ~3 hours on T4 GPU

Intended Use

Primary Use Cases

  • โœ… Real-time voice assistants
  • โœ… Arabic conversational AI
  • โœ… Turn-taking detection in dialogues
  • โœ… LiveKit agent integration

Limitations

  • Trained primarily on Saudi dialect patterns
  • Requires text input (not raw audio)
  • Best for conversational context (5-10 seconds)
  • May need threshold tuning for specific use cases

Dataset

Training dataset available at: HossamEL-Dein/arabic-eou-dataset

Citation

@misc{arabic-eou-2024,
  author = {HossamEL-Dein},
  title = {Arabic End-of-Utterance Detection Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/HossamEL-Dein/arabic-eou-model}
}

License

Apache 2.0

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
37
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HossamEL-Dein/arabic-eou-model

Finetuned
(4017)
this model

Dataset used to train HossamEL-Dein/arabic-eou-model