Arabic End-of-Utterance (EOU) Detector

Detect when a speaker has finished their utterance in Arabic conversations.

This model is fine-tuned from AraBERT v2 for binary classification of Arabic text to determine if an utterance is complete (EOU) or incomplete (No EOU).

Model Description

Model Type: BERT-based binary classifier
Base Model: aubmindlab/bert-base-arabertv2
Language: Arabic (ar)
Task: End-of-Utterance Detection
License: Apache 2.0

Performance

Metric	Value
Accuracy	90%
Precision (EOU)	0.90
Recall (EOU)	0.93
F1-Score (EOU)	0.92
Test Samples	1,001

Confusion Matrix

           Predicted
           No EOU  EOU
Actual No  333     62   (84.3% correct)
       EOU 42      564  (93.1% correct)

Available Formats

This repository includes three model formats:

PyTorch (pytorch_model.bin or model.safetensors) - For training and fine-tuning
ONNX (model.onnx) - For optimized CPU/GPU inference (~2-3x faster)
Quantized ONNX (model_quantized.onnx) - For production (75% smaller, 2-3x faster)

Quick Start

Installation

pip install transformers torch onnxruntime

PyTorch Inference

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "your-username/arabic-eou-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Inference
def predict_eou(text: str):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
    
    logits = outputs.logits
    probs = torch.softmax(logits, dim=-1)
    is_eou = torch.argmax(probs, dim=-1).item() == 1
    confidence = probs[0, 1].item()
    
    return is_eou, confidence

# Test
text = "مرحبا كيف حالك"
is_eou, conf = predict_eou(text)
print(f"Is EOU: {is_eou}, Confidence: {conf:.4f}")

ONNX Inference (Recommended for Production)

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

# Load model and tokenizer
model_name = "your-username/arabic-eou-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load ONNX model (use model_quantized.onnx for best performance)
session = ort.InferenceSession(
    "model_quantized.onnx",  # or "model.onnx"
    providers=['CPUExecutionProvider']
)

# Inference
def predict_eou(text: str):
    inputs = tokenizer(
        text,
        padding="max_length",
        max_length=512,
        truncation=True,
        return_tensors="np"
    )
    
    outputs = session.run(
        None,
        {
            'input_ids': inputs['input_ids'].astype(np.int64),
            'attention_mask': inputs['attention_mask'].astype(np.int64)
        }
    )
    
    logits = outputs[0]
    probs = np.exp(logits) / np.sum(np.exp(logits), axis=-1, keepdims=True)
    is_eou = np.argmax(probs, axis=-1)[0] == 1
    confidence = float(probs[0, 1])
    
    return is_eou, confidence

# Test
text = "مرحبا كيف حالك"
is_eou, conf = predict_eou(text)
print(f"Is EOU: {is_eou}, Confidence: {conf:.4f}")

Use Cases

Voice Assistants: Detect when user has finished speaking
Conversational AI: Improve turn-taking in Arabic chatbots
LiveKit Agents: Custom turn detection for Arabic conversations
Speech Recognition: Post-processing for better utterance segmentation

Integration with LiveKit

from livekit.plugins.arabic_turn_detector import ArabicTurnDetector

# Download model from HuggingFace
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="your-username/arabic-eou-detector",
    filename="model_quantized.onnx"
)

# Create turn detector
turn_detector = ArabicTurnDetector(
    model_path=model_path,
    unlikely_threshold=0.7
)

# Use in agent
session = AgentSession(
    turn_detector=turn_detector,
    # ... other config
)

Training Details

Training Data

Dataset: Arabic EOU Detection (10,072 samples)
Train/Val/Test Split: 80/10/10
Classes:
- 0: Incomplete utterance (No EOU)
- 1: Complete utterance (EOU)

Training Hyperparameters

Base Model: aubmindlab/bert-base-arabertv2
Learning Rate: 2e-5
Batch Size: 32
Epochs: 10
Optimizer: AdamW
Weight Decay: 0.01
Max Sequence Length: 512

Preprocessing

AraBERT normalization (diacritics removal, character normalization)
Tokenization with AraBERT tokenizer
Padding to max length (512 tokens)

Limitations

Language: Optimized for Modern Standard Arabic (MSA)
Domain: Trained on conversational Arabic text
Sequence Length: Maximum 512 tokens
Dialects: May have reduced accuracy on dialectal Arabic

Citation

If you use this model, please cite:

@misc{arabic-eou-detector,
  author = {Your Name},
  title = {Arabic End-of-Utterance Detector},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/your-username/arabic-eou-detector}}
}

License

Apache 2.0

Acknowledgments

AraBERT: aubmindlab/bert-base-arabertv2
HuggingFace Transformers: Model training and inference
ONNX Runtime: Model optimization and deployment

Contact

For issues or questions, please open an issue on the GitHub repository.

Downloads last month: 39

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Accuracy on Arabic EOU Detection
self-reported

0.900
F1 Score (EOU) on Arabic EOU Detection
self-reported

0.920
Precision (EOU) on Arabic EOU Detection
self-reported

0.900
Recall (EOU) on Arabic EOU Detection
self-reported

0.930