Summarize Arabic text. Extractive, offline, dialect-aware. No API. No model.
pip install mukhtasar
Every Arabic NLP pipeline needs summarization. Existing tools require cloud APIs, massive model downloads, or don't handle Arabic morphology. mukhtasar uses TextRank + TF-IDF with 7 Arabic-aware features to extract the most important sentences. Zero dependencies beyond Rich.
Summarize text from argument or stdin. Quick inline summarization.
Summarize a file (.txt, .jsonl). Control ratio with --ratio flag.
Summarize multiple documents with redundancy removal. >80% overlap dedup.
Show sentences ranked by importance with full feature breakdown.
ROUGE evaluation (ROUGE-1, ROUGE-2, ROUGE-L) against reference summary.
How mukhtasar works — scoring weights, stemming, sentence splitting.
No cloud API. No model download. No numpy. No transformers. Only Rich for pretty terminal output. Light stemming turns كتابات into كتاب and المطورون into مطور. Smart sentence splitting handles Arabic commas, semicolons, bullets, and quoted speech.