Scan Arabic text for toxicity, hate speech, and spam. Dialect-aware. Fully offline.
13 filters · 3 severity levels · fully offline
Arabic-first content moderation
$pip install sarih
The Problem
Arabic platforms need Arabic moderation.
Social apps, chatbots, UGC platforms. If your users write Arabic, you need moderation that understands Arabic. Not translated English rules. Actual Arabic understanding.
🌐
English-First Tools
Existing moderation APIs are built for English. They miss Arabic slang, dialect-specific insults, and culturally specific toxicity.
🇪🇬
Dialect Blindness
An insult in Egyptian Arabic looks nothing like the same insult in Gulf Arabic. MSA-only tools miss dialect-specific content entirely.
☁
Cloud Dependency
Sending user content to third-party APIs for moderation. Privacy concerns, latency, cost, and single point of failure.
🔒
Data Privacy
Sensitive user content leaving your infrastructure. Compliance nightmares. sarih runs entirely on your machine.
13 Filters
Every type of toxic content. Caught.
sarih runs 7 specialized filters on every input, each tuned for Arabic text patterns across all major dialects.
Profanity
Swear words and vulgar language across all Arabic dialects. Not just MSA dictionaries.
Hate Speech
Slurs and dehumanizing language targeting ethnic, religious, or social groups.
Spam
Repetitive text, promotional patterns, keyword stuffing, and bot-like content.
Adult Content
Sexually explicit language, innuendo, and inappropriate material.
Violence
Threats, incitement to harm, and graphic descriptions of violence.
PII
Phone numbers, emails, national IDs, and other personally identifiable information in Arabic text.
Misinformation
Common health and political misinformation patterns circulating in Arabic.
Severity
Three levels. Clear action for each.
Every flagged piece of content gets a severity level so you know exactly what to do with it.
BLOCK
Must remove. Clearly toxic, dangerous, or illegal content.
FLAG
Needs human review. Likely problematic but context-dependent.
REVIEW
Soft signal. May be fine, but worth a second look.
Dialect Aware
Five dialects. One tool.
The same insult gets expressed differently in each dialect. sarih catches all of them.
MSA
فصحى
Egyptian
مصري
Gulf
خليجي
Levantine
شامي
Moroccan
مغربي
In Action
Moderate. Detect. Protect.
Commands
6 commands. Zero config.
scan
Scan a JSONL file for toxic content. Rich table output with filter hits, severity, and matched terms.
check
Check a single text string. Returns filter results with severity level and detected dialect.
pipe
Read from stdin for pipeline integration. Works with cat, jq, and other Unix tools.
stats
Show moderation statistics for a scanned file. Breakdown by filter, severity, and dialect.
clean
Remove or redact flagged content. Use --output to write the cleaned file.
explain
Describe all 13 filters and 3 severity levels with examples.
Get Started
Three lines to moderate your content.
# Install$ pip install sarih# Check a single text$ sarih check "text to moderate"# Scan a dataset$ sarih scan data.jsonl# Clean a dataset$ sarih clean data.jsonl --output clean.jsonl# View statistics$ sarih stats data.jsonl# Pipe from stdin$ cat texts.jsonl | sarih pipe# Learn about filters$ sarih explain
As a Library
Import and moderate in two lines.
Use sarih directly in your Python code. No CLI needed.
from sarih import moderate
result = moderate("text to check")
print(result.severity) # BLOCK, FLAG, or REVIEWprint(result.filters) # list of triggered filtersprint(result.dialect) # detected dialect