naql — Arabic Model Format Converter

The Problem

Converting models breaks Arabic tokenizers.

You fine-tune a model with Arabic support. You convert it from HuggingFace to GGUF for deployment. The conversion silently drops Arabic tokens, shifts vocab indices, or corrupts diacritics. Your model now outputs gibberish for Arabic while English still works fine. naql catches this before it reaches production.

Supported Formats

8 formats. Read, convert, validate.

naql reads model headers directly — no heavy ML framework required.

📦

GGUF

llama.cpp quantized models. .gguf files.

🔒

SafeTensors

HuggingFace safe format. .safetensors files.

⚙

ONNX

Open Neural Network Exchange. .onnx files.

🍎

MLX

Apple MLX framework. weights.npz + config.json.

⚡

JANG

Adaptive mixed-precision MLX. "The GGUF for MLX."

🔥

PyTorch

Native PyTorch checkpoints. .pt, .bin files.

🤗

HuggingFace

Model directories with config, tokenizer, weights.

⚡

GPTQ

GPU quantized models via AutoGPTQ. 4-bit inference.

🧠

AWQ

Activation-aware weight quantization. High quality 4-bit.

Arabic Check

Verify Arabic tokenizer preservation.

naql scans the vocab for Arabic tokens and reports coverage — before and after conversion.

Coverage Scan

Counts Arabic tokens in the vocabulary. Reports percentage, character coverage, and script distribution across the full vocab.

naql arabic model/

Validation

Compares source and target tokenizers after conversion. Catches dropped tokens, shifted indices, and corrupted diacritics.

naql validate source/ target/

28 Base Letters

Verifies all 28 Arabic letters are present in the tokenizer. Checks tashkeel (diacritics), Arabic digits, and common bigrams.

All 28 base letters covered

Bigram Coverage

Tests the most common Arabic character bigrams against the tokenizer. Low coverage means the model will over-tokenize Arabic text.

94% bigram coverage

Conversion Matrix

What converts to what.

naql generates the right command and validates the output. You install the tools you need.

From / To	GGUF	SafeTensors	ONNX	MLX	PyTorch	HF	GPTQ	AWQ
GGUF	-	Yes	-	Yes	-	Yes	-	-
SafeTensors	Yes	-	Yes	Yes	Yes	Yes	-	-
ONNX	-	Yes	-	-	Yes	-	-	-
MLX	Yes	Yes	-	-	-	Yes	-	-
PyTorch	Yes	Yes	Yes	Yes	-	Yes	-	-
HuggingFace	Yes	Yes	Yes	Yes	Yes	-	Yes	Yes
GPTQ	Yes	-	-	-	-	-	-	-
AWQ	Yes	-	-	-	-	-	-	-

Commands

7 commands. Zero config.

inspect

Read model format, size, quantization, layer count, and vocab. Works on files and directories.

arabic

Scan tokenizer vocab for Arabic tokens. Reports coverage, character presence, and script distribution.

convert

Convert between formats. Generates the command, runs it, and validates Arabic tokenizer preservation.

validate

Compare source and target after conversion. Checks vocab alignment, token mapping, and Arabic integrity.

diff

Compare two models side by side. Shows field-level differences with Arabic token delta.

formats

List supported formats with detection rules, file extensions, and available conversion paths.

explain

Show how naql works — format detection pipeline, conversion strategy, and Arabic validation logic.

Get Started

Inspect. Convert. Validate.

# Install $ pip install naql # Inspect a model $ naql inspect model.gguf Format: GGUF Quant: Q4_K_M Params: 2.5B Vocab: 151,936 # Check Arabic tokenizer $ naql arabic model/ Arabic tokens: 4,217 (2.8%) Letters: 28/28 Verdict: GOOD # Convert to MLX $ naql convert model/ --to mlx Converting HuggingFace -> MLX... done. Arabic check: PASS # Validate the conversion $ naql validate model/ model-mlx/ Vocab match: 151,936/151,936 Arabic tokens: preserved Status: OK

Demo

See it in action.

Inspect, convert, and validate — all from the terminal.

naql demo — inspect, convert, and validate models from the terminal

naqlنقل