Arabic Text Summarizer

mukhtasarمختصر

Summarize Arabic text. Extractive, offline, dialect-aware. No API. No model.

7 Features · 250+ Stopwords · 4 Dialects · 6 Commands
$ pip install mukhtasar
The Problem
Arabic summarization shouldn't need a cloud API.

Every Arabic NLP pipeline needs summarization. Existing tools require cloud APIs, massive model downloads, or don't handle Arabic morphology. mukhtasar uses TextRank + TF-IDF with 7 Arabic-aware features to extract the most important sentences. Zero dependencies beyond Rich.

7-Feature Scoring
Arabic-aware sentence importance scoring.
Each sentence is scored across 7 weighted features designed for Arabic text morphology and structure.
🌐

TextRank

35% weight
  • Graph-based ranking
  • TF-IDF similarity
  • Sentence centrality
📍

Position

15% weight
  • Lead sentences boost
  • Paragraph awareness
🔍

Cue Words

15% weight
  • Arabic signal phrases
  • Importance markers
📑

Title Similarity

15% weight
  • Title word overlap
  • Topic relevance
ع

Proper Nouns

10% weight
  • Named entity signals
  • Arabic noun patterns
#

Numbers & Length

5% + 5% weight
  • Numeric data presence
  • Sentence length score
Arabic NLP
Dialect-aware. Morphology-aware.
Built from the ground up for Arabic text processing across MSA and dialects.
250+
MSA — Modern Standard Arabic
Gulf — Saudi, Emirati, Kuwaiti
Egyptian — Masri dialect
Levantine — Syrian, Lebanese, Jordanian
Commands
6 commands. Zero config.

text

Summarize text from argument or stdin. Quick inline summarization.

file

Summarize a file (.txt, .jsonl). Control ratio with --ratio flag.

multi

Summarize multiple documents with redundancy removal. >80% overlap dedup.

score

Show sentences ranked by importance with full feature breakdown.

eval

ROUGE evaluation (ROUGE-1, ROUGE-2, ROUGE-L) against reference summary.

explain

How mukhtasar works — scoring weights, stemming, sentence splitting.

Dependencies
Zero beyond Rich.
0

No cloud API. No model download. No numpy. No transformers. Only Rich for pretty terminal output. Light stemming turns كتابات into كتاب and المطورون into مطور. Smart sentence splitting handles Arabic commas, semicolons, bullets, and quoted speech.

Get Started
Four lines to summarize Arabic text.
# Install $ pip install mukhtasar # Summarize inline text $ mukhtasar text "الذكاء الاصطناعي يغير العالم..." --title "الذكاء الاصطناعي" # Summarize a file $ mukhtasar file article.txt --ratio 0.2 # ROUGE evaluation $ mukhtasar eval --reference gold.txt --summary output.txt
artok · bidi-guard · arabench · majal · khalas · safha · raqeeb · sarih · qalam · naql · samt · jadwal · mukhtasar