Content Moderation

sarihصريح

Scan Arabic text for toxicity, hate speech, and spam. Dialect-aware. Fully offline.

13 filters · 3 severity levels · fully offline

Arabic-first content moderation

$ pip install sarih

The Problem

Arabic platforms need Arabic moderation.

Social apps, chatbots, UGC platforms. If your users write Arabic, you need moderation that understands Arabic. Not translated English rules. Actual Arabic understanding.

🌐

English-First Tools

Existing moderation APIs are built for English. They miss Arabic slang, dialect-specific insults, and culturally specific toxicity.

🇪🇬

Dialect Blindness

An insult in Egyptian Arabic looks nothing like the same insult in Gulf Arabic. MSA-only tools miss dialect-specific content entirely.

☁

Cloud Dependency

Sending user content to third-party APIs for moderation. Privacy concerns, latency, cost, and single point of failure.

🔒

Data Privacy

Sensitive user content leaving your infrastructure. Compliance nightmares. sarih runs entirely on your machine.

13 Filters

Every type of toxic content. Caught.

sarih runs 7 specialized filters on every input, each tuned for Arabic text patterns across all major dialects.

Profanity

Swear words and vulgar language across all Arabic dialects. Not just MSA dictionaries.

Hate Speech

Slurs and dehumanizing language targeting ethnic, religious, or social groups.

Spam

Repetitive text, promotional patterns, keyword stuffing, and bot-like content.

Adult Content

Sexually explicit language, innuendo, and inappropriate material.

Violence

Threats, incitement to harm, and graphic descriptions of violence.

PII

Phone numbers, emails, national IDs, and other personally identifiable information in Arabic text.

Misinformation

Common health and political misinformation patterns circulating in Arabic.

Severity

Three levels. Clear action for each.

Every flagged piece of content gets a severity level so you know exactly what to do with it.

BLOCK

Must remove. Clearly toxic, dangerous, or illegal content.

FLAG

Needs human review. Likely problematic but context-dependent.

REVIEW

Soft signal. May be fine, but worth a second look.

Dialect Aware

Five dialects. One tool.

The same insult gets expressed differently in each dialect. sarih catches all of them.

MSA

فصحى

Egyptian

مصري

Gulf

خليجي

Levantine

شامي

Moroccan

مغربي

In Action

Moderate. Detect. Protect.

Commands

6 commands. Zero config.

scan

Scan a JSONL file for toxic content. Rich table output with filter hits, severity, and matched terms.

check

Check a single text string. Returns filter results with severity level and detected dialect.

pipe

Read from stdin for pipeline integration. Works with cat, jq, and other Unix tools.

stats

Show moderation statistics for a scanned file. Breakdown by filter, severity, and dialect.

clean

Remove or redact flagged content. Use --output to write the cleaned file.

explain

Describe all 13 filters and 3 severity levels with examples.

Get Started

Three lines to moderate your content.

# Install $ pip install sarih # Check a single text $ sarih check "text to moderate" # Scan a dataset $ sarih scan data.jsonl # Clean a dataset $ sarih clean data.jsonl --output clean.jsonl # View statistics $ sarih stats data.jsonl # Pipe from stdin $ cat texts.jsonl | sarih pipe # Learn about filters $ sarih explain

As a Library

Import and moderate in two lines.

Use sarih directly in your Python code. No CLI needed.

from sarih import moderate result = moderate("text to check") print(result.severity) # BLOCK, FLAG, or REVIEW print(result.filters) # list of triggered filters print(result.dialect) # detected dialect