🛡️ Arabic Text Detoxification Model

Ensemble Knowledge Distillation Approach

Transform toxic Arabic text into polite, neutral alternatives while preserving meaning

Model Demo | Architecture | Dataset | Results

📊 Architecture Overview

🎯 Model Description

This model performs text detoxification for Arabic language — converting offensive, toxic, or aggressive text into neutral, polite alternatives while preserving the original semantic meaning.

Key Features

Feature	Description
🏗️ Architecture	Bloom-1b7 (1.7B parameters) fine-tuned with ensemble distillation
🌍 Language	Arabic (Modern Standard Arabic + dialects)
📚 Training	Ensemble of 3 models → Knowledge distillation → Final model
⚡ Hardware	Optimized for NVIDIA A100 40GB, works on consumer GPUs
📏 Context	Up to 2048 tokens

Ensemble Components

Model	Parameters	Role	Source
AraGPT2-Medium	370M	Arabic Language Expert	AUB MIND Lab
Bloom-560m	560M	Multilingual Generalization	BigScience
Bloom-1b7	1.7B	High Capacity Patterns	BigScience

📈 Evaluation Results

Metric	Score	Description
J-Score	0.7129	Joint metric (geometric mean)
STA	0.9500	Style Transfer Accuracy
SIM (ref)	0.9995	Similarity to reference
Fluency	1.0000	Grammatical correctness

J-Score    ████████████████████████████░░░░░░░░░░  0.71
STA        ██████████████████████████████████████  0.95
SIM (ref)  ██████████████████████████████████████  1.00
Fluency    ██████████████████████████████████████  1.00

🚀 Quick Start

Installation

pip install transformers torch

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
model_name = "ispromashka/arab-detoxification-isp"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.to("cuda")  # or "cpu"

def detoxify(text: str) -> str:
    """Convert toxic Arabic text to neutral form."""
    prompt = f"سام: {text}\nمهذب:"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.2,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
    )
    
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return result.split("مهذب:")[-1].strip().split("\n")[0]

# Example
toxic_text = "أنت غبي جداً"
neutral_text = detoxify(toxic_text)
print(f"Input:  {toxic_text}")
print(f"Output: {neutral_text}")

💡 Examples

Category	Toxic Input (سام)	Neutral Output (مهذب)
Insult	أنت غبي جداً	ربما تحتاج إلى مزيد من الوقت للفهم
Command	اخرس يا أحمق	أرجو أن تكون أكثر هدوءاً
Criticism	هذا العمل تافه وسخيف	العمل يمكن تطويره
Threat	سأجعلك تندم	دعنا نحل هذا بسلام
Contempt	أنت فاشل تماماً	النجاح يحتاج لمزيد من الجهد
Mockery	يا له من غبي	ربما لم يفهم جيداً
Blame	كل شيء خطؤك	نحتاج تحديد المسؤوليات
Appearance	منظرك سيء	المظهر يمكن تحسينه

🔬 Methodology

Training Pipeline

┌─────────────────────────────────────────────────────────────┐
│                    STAGE 1: Base Models                     │
├─────────────────────────────────────────────────────────────┤
│  Train 3 specialized models independently on detox dataset  │
│  • AraGPT2-Medium (25 epochs)                               │
│  • Bloom-560m (25 epochs)                                   │
│  • Bloom-1b7 (20 epochs)                                    │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                 STAGE 2: Ensemble Selection                 │
├─────────────────────────────────────────────────────────────┤
│  For each input, select best prediction using:              │
│  Sentence-BERT (paraphrase-multilingual-mpnet-base-v2)      │
│  Selection: argmax(cosine_similarity(pred, reference))      │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│               STAGE 3: Knowledge Distillation               │
├─────────────────────────────────────────────────────────────┤
│  Fine-tune fresh Bloom-1b7 on:                              │
│  • Original dataset (3000+ examples)                        │
│  • Ensemble best predictions (1500+ examples)               │
│  • Total: 4500+ training examples                           │
└─────────────────────────────────────────────────────────────┘

Evaluation Metrics

J-Score (Primary metric):

$J = \sqrt[3]{STA \times SIM \times FL}$

Where:

STA (Style Transfer Accuracy): Measures toxicity removal success
SIM (Semantic Similarity): Content preservation (Sentence-BERT cosine similarity)
FL (Fluency): Ratio of grammatically valid outputs

📁 Dataset

Dataset used for training and evaluation:
ispromashka/arabic-detox-dataset

Composition

Category	Examples	Description
Personal Insults	30	Direct personal attacks
Aggressive Commands	20	Hostile imperatives
Work Criticism	25	Professional negative feedback
Threats	15	Intimidation and warnings
Contempt	15	Expressions of superiority
Blame	15	Accusatory statements
Appearance Criticism	15	Physical/aesthetic insults
Mockery	15	Sarcastic belittling
Total Unique	150	—
Augmented (×20)	3,000+	Training examples

Data Format

سام: {toxic_text}
مهذب: {neutral_text}<EOS>

⚙️ Training Configuration

Parameter	Base Models	Final Model
Hardware	NVIDIA A100 40GB	NVIDIA A100 40GB
Precision	BF16	BF16
Batch Size	8–16	8
Learning Rate	2e-5 – 3e-5	1.5e-5
Epochs	20–25	15
Optimizer	AdamW	AdamW
Scheduler	Cosine	Cosine
Warmup	10%	10%
Total Time	~85 min	~30 min

⚠️ Limitations

Language Coverage: Optimized for Modern Standard Arabic; dialectal performance may vary
Text Length: Best for short-medium texts (< 100 tokens)
Domain: Trained on general toxicity; domain-specific content may need fine-tuning
Context: Does not consider conversation history

📖 Citation

@misc{arabicdetox2024,
  author = {ispromashka},
  title = {Arabic Text Detoxification: Ensemble Knowledge Distillation Approach},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ispromashka/arab-detoxification-isp}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ for the Arabic NLP community

GitHub

Downloads last month: 34

Safetensors

Model size

2B params

Tensor type

BF16

Evaluation results

STA on Arabic Detox Dataset
self-reported

0.950