Arabic Text Summarizer

mukhtasarمختصر

Summarize Arabic text. Extractive, offline, dialect-aware. No API. No model.

7 Features · 250+ Stopwords · 4 Dialects · 6 Commands

$ pip install mukhtasar

The Problem

Arabic summarization shouldn't need a cloud API.

Every Arabic NLP pipeline needs summarization. Existing tools require cloud APIs, massive model downloads, or don't handle Arabic morphology. mukhtasar uses TextRank + TF-IDF with 7 Arabic-aware features to extract the most important sentences. Zero dependencies beyond Rich.

7-Feature Scoring

Arabic-aware sentence importance scoring.

Each sentence is scored across 7 weighted features designed for Arabic text morphology and structure.

🌐

TextRank

35% weight

Graph-based ranking
TF-IDF similarity
Sentence centrality

📍

Position

15% weight

Lead sentences boost
Paragraph awareness

🔍

Cue Words

15% weight

Arabic signal phrases
Importance markers

📑

Title Similarity

15% weight

Title word overlap
Topic relevance

Proper Nouns

10% weight

Named entity signals
Arabic noun patterns

Numbers & Length

5% + 5% weight

Numeric data presence
Sentence length score

Arabic NLP

Dialect-aware. Morphology-aware.

Built from the ground up for Arabic text processing across MSA and dialects.

250+

MSA — Modern Standard Arabic

Gulf — Saudi, Emirati, Kuwaiti

Egyptian — Masri dialect

Levantine — Syrian, Lebanese, Jordanian

Commands

6 commands. Zero config.

text

Summarize text from argument or stdin. Quick inline summarization.

file

Summarize a file (.txt, .jsonl). Control ratio with --ratio flag.

multi

Summarize multiple documents with redundancy removal. >80% overlap dedup.

score

Show sentences ranked by importance with full feature breakdown.

eval

ROUGE evaluation (ROUGE-1, ROUGE-2, ROUGE-L) against reference summary.

explain

How mukhtasar works — scoring weights, stemming, sentence splitting.

Dependencies

Zero beyond Rich.

No cloud API. No model download. No numpy. No transformers. Only Rich for pretty terminal output. Light stemming turns كتابات into كتاب and المطورون into مطور. Smart sentence splitting handles Arabic commas, semicolons, bullets, and quoted speech.

Get Started

Four lines to summarize Arabic text.

# Install $ pip install mukhtasar # Summarize inline text $ mukhtasar text "الذكاء الاصطناعي يغير العالم..." --title "الذكاء الاصطناعي" # Summarize a file $ mukhtasar file article.txt --ratio 0.2 # ROUGE evaluation $ mukhtasar eval --reference gold.txt --summary output.txt