๐ก๏ธ Arabic Text Detoxification Model
Ensemble Knowledge Distillation Approach
Transform toxic Arabic text into polite, neutral alternatives while preserving meaning
Model Demo | Architecture | Dataset | Results
๐ Architecture Overview
๐ฏ Model Description
This model performs text detoxification for Arabic language โ converting offensive, toxic, or aggressive text into neutral, polite alternatives while preserving the original semantic meaning.
Key Features
| Feature | Description |
|---|---|
| ๐๏ธ Architecture | Bloom-1b7 (1.7B parameters) fine-tuned with ensemble distillation |
| ๐ Language | Arabic (Modern Standard Arabic + dialects) |
| ๐ Training | Ensemble of 3 models โ Knowledge distillation โ Final model |
| โก Hardware | Optimized for NVIDIA A100 40GB, works on consumer GPUs |
| ๐ Context | Up to 2048 tokens |
Ensemble Components
| Model | Parameters | Role | Source |
|---|---|---|---|
| AraGPT2-Medium | 370M | Arabic Language Expert | AUB MIND Lab |
| Bloom-560m | 560M | Multilingual Generalization | BigScience |
| Bloom-1b7 | 1.7B | High Capacity Patterns | BigScience |
๐ Evaluation Results
| Metric | Score | Description |
|---|---|---|
| J-Score | 0.7129 | Joint metric (geometric mean) |
| STA | 0.9500 | Style Transfer Accuracy |
| SIM (ref) | 0.9995 | Similarity to reference |
| Fluency | 1.0000 | Grammatical correctness |
J-Score โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 0.71
STA โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 0.95
SIM (ref) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1.00
Fluency โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1.00
๐ Quick Start
Installation
pip install transformers torch
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model
model_name = "ispromashka/arab-detoxification-isp"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.to("cuda") # or "cpu"
def detoxify(text: str) -> str:
"""Convert toxic Arabic text to neutral form."""
prompt = f"ุณุงู
: {text}\nู
ูุฐุจ:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
return result.split("ู
ูุฐุจ:")[-1].strip().split("\n")[0]
# Example
toxic_text = "ุฃูุช ุบุจู ุฌุฏุงู"
neutral_text = detoxify(toxic_text)
print(f"Input: {toxic_text}")
print(f"Output: {neutral_text}")
๐ก Examples
| Category | Toxic Input (ุณุงู ) | Neutral Output (ู ูุฐุจ) |
|---|---|---|
| Insult | ุฃูุช ุบุจู ุฌุฏุงู | ุฑุจู ุง ุชุญุชุงุฌ ุฅูู ู ุฒูุฏ ู ู ุงูููุช ููููู |
| Command | ุงุฎุฑุณ ูุง ุฃุญู ู | ุฃุฑุฌู ุฃู ุชููู ุฃูุซุฑ ูุฏูุกุงู |
| Criticism | ูุฐุง ุงูุนู ู ุชุงูู ูุณุฎูู | ุงูุนู ู ูู ูู ุชุทููุฑู |
| Threat | ุณุฃุฌุนูู ุชูุฏู | ุฏุนูุง ูุญู ูุฐุง ุจุณูุงู |
| Contempt | ุฃูุช ูุงุดู ุชู ุงู ุงู | ุงููุฌุงุญ ูุญุชุงุฌ ูู ุฒูุฏ ู ู ุงูุฌูุฏ |
| Mockery | ูุง ูู ู ู ุบุจู | ุฑุจู ุง ูู ูููู ุฌูุฏุงู |
| Blame | ูู ุดูุก ุฎุทุคู | ูุญุชุงุฌ ุชุญุฏูุฏ ุงูู ุณุคูููุงุช |
| Appearance | ู ูุธุฑู ุณูุก | ุงูู ุธูุฑ ูู ูู ุชุญุณููู |
๐ฌ Methodology
Training Pipeline
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STAGE 1: Base Models โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Train 3 specialized models independently on detox dataset โ
โ โข AraGPT2-Medium (25 epochs) โ
โ โข Bloom-560m (25 epochs) โ
โ โข Bloom-1b7 (20 epochs) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STAGE 2: Ensemble Selection โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ For each input, select best prediction using: โ
โ Sentence-BERT (paraphrase-multilingual-mpnet-base-v2) โ
โ Selection: argmax(cosine_similarity(pred, reference)) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STAGE 3: Knowledge Distillation โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Fine-tune fresh Bloom-1b7 on: โ
โ โข Original dataset (3000+ examples) โ
โ โข Ensemble best predictions (1500+ examples) โ
โ โข Total: 4500+ training examples โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Evaluation Metrics
J-Score (Primary metric):
Where:
- STA (Style Transfer Accuracy): Measures toxicity removal success
- SIM (Semantic Similarity): Content preservation (Sentence-BERT cosine similarity)
- FL (Fluency): Ratio of grammatically valid outputs
๐ Dataset
Dataset used for training and evaluation:
ispromashka/arabic-detox-dataset
Composition
| Category | Examples | Description |
|---|---|---|
| Personal Insults | 30 | Direct personal attacks |
| Aggressive Commands | 20 | Hostile imperatives |
| Work Criticism | 25 | Professional negative feedback |
| Threats | 15 | Intimidation and warnings |
| Contempt | 15 | Expressions of superiority |
| Blame | 15 | Accusatory statements |
| Appearance Criticism | 15 | Physical/aesthetic insults |
| Mockery | 15 | Sarcastic belittling |
| Total Unique | 150 | โ |
| Augmented (ร20) | 3,000+ | Training examples |
Data Format
ุณุงู
: {toxic_text}
ู
ูุฐุจ: {neutral_text}<EOS>
โ๏ธ Training Configuration
| Parameter | Base Models | Final Model |
|---|---|---|
| Hardware | NVIDIA A100 40GB | NVIDIA A100 40GB |
| Precision | BF16 | BF16 |
| Batch Size | 8โ16 | 8 |
| Learning Rate | 2e-5 โ 3e-5 | 1.5e-5 |
| Epochs | 20โ25 | 15 |
| Optimizer | AdamW | AdamW |
| Scheduler | Cosine | Cosine |
| Warmup | 10% | 10% |
| Total Time | ~85 min | ~30 min |
โ ๏ธ Limitations
- Language Coverage: Optimized for Modern Standard Arabic; dialectal performance may vary
- Text Length: Best for short-medium texts (< 100 tokens)
- Domain: Trained on general toxicity; domain-specific content may need fine-tuning
- Context: Does not consider conversation history
๐ Citation
@misc{arabicdetox2024,
author = {ispromashka},
title = {Arabic Text Detoxification: Ensemble Knowledge Distillation Approach},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/ispromashka/arab-detoxification-isp}
}
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
Made with โค๏ธ for the Arabic NLP community
- Downloads last month
- 34
Evaluation results
- STA on Arabic Detox Datasetself-reported0.950