Scan Arabic text for toxicity, hate speech, and spam. Dialect-aware. Fully offline.
13 filters · 3 severity levels · fully offline — Arabic-first content moderation for social apps, chatbots, and UGC platforms.
$pip install sarih
Chapter I
The Problem
Arabic platforms need Arabic moderation.
English-First Tools Existing moderation APIs are built for English. They miss Arabic slang, dialect-specific insults, and culturally specific toxicity.
Dialect Blindness An insult in Egyptian Arabic looks nothing like the same insult in Gulf Arabic. MSA-only tools miss dialect-specific content entirely.
Cloud Dependency Sending user content to third-party APIs. Privacy concerns, latency, cost, and single point of failure.
Data Privacy Sensitive user content leaving your infrastructure. Compliance nightmares. sarih runs entirely on your machine.
Chapter II
13 Filters
Every type of toxic content. Caught.
Profanity Swear words and vulgar language across all Arabic dialects. Not just MSA dictionaries.
Hate Speech Slurs and dehumanizing language targeting ethnic, religious, or social groups.
Spam Repetitive text, promotional patterns, keyword stuffing, and bot-like content.
Adult Content Sexually explicit language, innuendo, and inappropriate material.
Violence Threats, incitement to harm, and graphic descriptions of violence.
PII Phone numbers, emails, national IDs, and other personally identifiable information in Arabic text.
Misinformation Common health and political misinformation patterns circulating in Arabic.
Chapter III
Severity Levels
Three levels. Clear action for each.
Level
Meaning
BLOCK
Must remove. Clearly toxic, dangerous, or illegal content.
FLAG
Needs human review. Likely problematic but context-dependent.
REVIEW
Soft signal. May be fine, but worth a second look.
Chapter IV
Dialect Aware
Five dialects. One tool.
Dialect
Arabic
MSA
فصحى
Egyptian
مصري
Gulf
خليجي
Levantine
شامي
Moroccan
مغربي
Demo
See It
Chapter V
Commands
6 commands. Zero config.
scan Scan a JSONL file for toxic content
check Check a single text string
pipe Read from stdin for pipelines
stats Moderation statistics by filter and severity
clean Remove or redact flagged content
explain Describe all filters and severity levels
Chapter VI
Get Started
# Install$ pip install sarih# Check a single text$ sarih check "text to moderate"# Scan a dataset$ sarih scan data.jsonl# Clean a dataset$ sarih clean data.jsonl --output clean.jsonl# View statistics$ sarih stats data.jsonl# Pipe from stdin$ cat texts.jsonl | sarih pipe# Learn about filters$ sarih explain
Chapter VII
As a Library
Import and moderate in two lines.
from sarih import moderate
result = moderate("text to check")
print(result.severity) # BLOCK, FLAG, or REVIEWprint(result.filters) # list of triggered filtersprint(result.dialect) # detected dialect