فتاح — Fattah-2.5B

نموذج لغوي مصري مبني على Qwen3 بتقنية Depth-Up Scaling

Egyptian Arabic LLM Built on Qwen3 with Depth-Up Scaling

Overview

Fattah (فتاح — meaning "the opener" or "the one who opens doors") is a 2.5B parameter Large Language Model specialized for Egyptian Arabic, the most widely spoken Arabic dialect with over 100 million native speakers.

Fattah is built through a novel three-stage pipeline:

Depth-Up Scaling (DUS) — expanding Qwen3-1.7B from 28 to 40 transformer layers
Continual Pre-Training (CPT) — trained on a ~8.59B token Egyptian Arabic corpus, processing 5.51B tokens (64.1% of the full dataset)
Supervised Fine-Tuning (SFT) — 400K Egyptian Arabic instruction-response pairs

⚠️ Note: This is the pre-DPO version (CPT + SFT only). A DPO-aligned version (Fattah-2.5B-v2) is coming soon with improved factual accuracy, reduced hallucination, and better instruction following.

Model Details

Property	Value
Model Name	Fattah-2.5B
Base Model	Qwen/Qwen3-1.7B-Base
Architecture	Qwen3 (expanded via DUS)
Parameters	2,635,771,904 (~2.64B)
Transformer Layers	40 (expanded from 28)
Hidden Size	2048
Context Length	64K tokens (YaRN extended)
Language	Egyptian Arabic (primary), MSA, English
License	Apache 2.0
Training Compute	2× NVIDIA A6000 48GB

Training Pipeline

Stage 1 — Depth-Up Scaling (DUS)

Starting from Qwen/Qwen3-1.7B-Base, we applied Depth-Up Scaling surgery — the same technique used in SOLAR-10.7B — to expand the model from 28 to 40 transformer layers, increasing parameter count from 1.7B to ~2.5B without any training.

Qwen3-1.7B-Base (28 layers)
        ↓  DUS Surgery
Fattah-DUS (40 layers, ~2.5B)

Layer expansion strategy: concatenate layers [0-23] + layers [4-27], creating a deeper model that inherits the base model's knowledge while providing additional capacity for Egyptian Arabic adaptation.

Stage 2 — Continual Pre-Training (CPT)

Parameter	Value
Dataset	Custom Egyptian Arabic corpus (~8.59B tokens)
Total dataset tokens	~8.59B tokens
Tokens processed	5.51B tokens (64.1% of dataset)
Training steps	42,000
Learning rate	1e-5 (cosine decay)
Sequence length	4096
Batch size	2 per GPU × 8 grad accum × 2 GPUs = 131,072 tokens/step
Framework	ms-swift + DeepSpeed ZeRO-1
Final loss	1.824

Dataset composition:

51.7% Egyptian Arabic (web, subtitles, social media, educational)
22.1% Modern Standard Arabic (MSA)
13.8% English
12.4% Code

Stage 3 — Supervised Fine-Tuning (SFT)

Parameter	Value
Dataset	`MBZUAI-Paris/Egyptian-SFT-Mixture` (400K samples)
Epochs	2
Learning rate	5e-6 (cosine decay)
Final eval loss	1.668
Final token accuracy	67.01%
Training time	~19 hours

Context Extension — YaRN

After SFT, the context window was extended from 32K to 64K tokens using YaRN (Yet another RoPE extensioN):

"rope_scaling": {
    "rope_type": "yarn",
    "factor": 2.0,
    "original_max_position_embeddings": 32768
}

Evaluation Results

All evaluations use zero-shot log-likelihood scoring (same methodology as NileChat paper). HellaSwag uses length-normalized accuracy (acc_norm); all other benchmarks use unnormalized accuracy (acc).

Arabic Script Benchmarks — Full Comparison

All evaluations use zero-shot log-likelihood scoring. HellaSwag uses acc_norm (length-normalized accuracy). All other benchmarks use acc (unnormalized accuracy).

Published baselines are from the NileChat paper (Table 1). Fattah rows use our custom evaluation harness with identical zero-shot methodology.

Arabic Script Benchmarks

Model	Params	MMLU	Belebele	HellaSwag†	PIQA	WinoGrande	OpenBookQA	Avg
Nile-Chat-12B	12B	62.59	70.69	64.04	63.53	42.06	53.13	59.34
gemma-3-12b-it	12B	61.55	77.00	49.49	63.53	38.03	48.86	56.41
Qwen2.5-14B-Instruct	14B	60.81	72.33	55.84	59.97	38.26	50.28	56.25
Nile-Chat-3x4B-A6B	MoE	52.13	75.44	59.30	57.91	41.16	48.39	55.72
Nile-Chat-2x4B-A6B	MoE	52.05	73.89	59.69	62.26	41.61	44.07	55.60
AceGPT-v2-8b-chat	8B	55.25	73.33	53.14	58.39	39.82	47.16	54.52
Nile-Chat-4B	4B	50.25	68.56	55.92	61.87	40.94	46.02	53.93
c4ai-command-r7b	7B	70.67	61.84	50.39	57.20	36.91	46.02	53.84
ALLaM-7B-Instruct	7B	67.67	66.10	57.29	62.18	40.04	67.10	60.06
gemma-2-9b-it	9B	49.44	61.35	49.53	61.79	35.79	48.01	50.99
jais-adapted-13b-chat	13B	50.03	65.33	47.53	56.72	37.14	41.76	49.75
jais-family-13b-chat	13B	44.85	66.33	52.99	57.91	36.91	38.64	49.61
jais-family-6p7b-chat	7B	42.60	57.33	49.18	62.23	33.33	37.50	47.03
gemma-3-4b-it	4B	38.56	60.32	42.56	56.49	35.79	46.73	46.74
Qwen2.5-7B-Instruct	7B	64.22	58.02	45.47	56.41	38.70	11.34	45.69
jais-adapted-7b-chat	7B	40.96	55.67	40.85	56.50	32.89	42.33	44.87
Llama-3.1-8B-Instruct	8B	55.89	57.97	43.10	54.27	35.57	9.06	42.64

Fattah-2.5B (post-SFT) ⭐	2.5B	38.40	40.78	24.00	61.30	49.40	27.96	40.31
† HellaSwag uses `acc_norm` (length-normalized accuracy). All other benchmarks use `acc`.
‡ Published baselines are from the NileChat paper (Table 1) — these are instruction-tuned + RLHF-aligned models.
⭐ Best Fattah checkpoint (pre-DPO).

Key Highlights

PIQA (61.3%) — Fattah outperforms Qwen2.5-7B (56.4%), gemma-3-4b (56.5%), Llama-3.1-8B (54.3%), and all jais models despite being 2.5B
WinoGrande (49.4%) — Fattah scores higher than every published baseline in the table, including models 3–5× larger
Average gap — Fattah post-SFT (40.31%) is behind Nile-Chat-4B (53.93%) by 13.6 points; DPO alignment is expected to close this gap significantly
Comparable baselines — most fair comparison is with gemma-3-4b-it (4B, 46.74%) — Fattah is 2.5B and pre-DPO, 6.4 points behind a fully aligned 4B model

Full Training Journey (Base → DUS → CPT → SFT)

Benchmark	Base 1.7B	DUS 2.5B	Post-CPT	Post-SFT	Net (Base→SFT)
EgyptianMMLU	34.07%	29.20%	37.07%	38.40%	+4.33% ✅
EgyptianPIQA	54.80%	51.90%	61.10%	61.30%	+6.50% ✅
Belebele-Arz	37.00%	32.78%	41.56%	40.78%	+3.78% ✅
EgyHellaSwag	25.00%	23.60%	21.40%	24.00%	−1.00% ⚠️
WinoGrande	49.40%	49.40%	49.40%	49.40%	0.00% ➡️
OpenBookQA	21.03%	17.67%	27.74%	27.96%	+6.93% ✅
Average	36.88%	34.09%	39.71%	40.31%	+3.43% ✅
EGY Perplexity	18.84	46.31	6.69	—	−12.15 ✅

Key observations:

DUS surgery caused an expected temporary regression (34.09%) as the new layers were randomly initialized
CPT recovered and surpassed the base (39.71%), acquiring strong Egyptian Arabic dialect knowledge
SFT further improved average to 40.31%, with MMLU +1.33% and HellaSwag recovering from 21.4% → 24.0%
EGY Perplexity improvement of ×2.8 (18.84 → 6.69) confirms deep dialect acquisition during CPT

Usage

Installation

pip install transformers>=4.51.0 torch accelerate

Basic Chat

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "belal212/Fattah-2.5B-preview"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "system",
        "content": "أنت فتاح، مساعد ذكي ومفيد بتتكلم العربي المصري."
    },
    {
        "role": "user",
        "content": "كلمني عن القاهرة"
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False   # disable thinking mode for conversational use
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True
)
print(response)

With Thinking Mode (for complex reasoning)

messages = [
    {
        "role": "system",
        "content": "أنت فتاح، مساعد ذكي بتفكر خطوة بخطوة قبل ما تجاوب."
    },
    {
        "role": "user",
        "content": "ازاي أحسن خوارزمية للـ sorting في Python؟"
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True   # activate <think> mode
)

Intended Use

Fattah is designed for:

✅ Egyptian Arabic conversational AI
✅ Question answering in Egyptian dialect
✅ Text generation and creative writing in Egyptian Arabic
✅ RAG-based knowledge retrieval systems
✅ Foundation for Fattah-Coding (Python + React/TS specialist — coming soon)
✅ Agent systems requiring Egyptian Arabic understanding

Limitations

Factual hallucination: As a 2.5B model without DPO alignment, Fattah may confidently generate incorrect facts. A DPO-aligned version is in development.
Knowledge cutoff: Training data has a knowledge cutoff. Recent events are not known.
Dialect coverage: Optimized for Egyptian Arabic. Performance on other Arabic dialects is not guaranteed.
Model size: At 2.5B parameters, Fattah cannot match the factual depth of larger models. Use RAG for knowledge-intensive applications.
Pre-DPO: This version has not undergone preference optimization. Responses may occasionally be over-cautious or inconsistent in style.

Roadmap

Version	Status	Description
Fattah-2.5B	✅ Released	CPT + SFT, Egyptian Arabic assistant
Fattah-2.5B-v2	🔄 In progress	+ DPO alignment (Egyptian-DPO-Mixture)
Fattah-Python-2.5B	⏳ Planned	Fattah + Python/AI coding specialization
Fattah-React-2.5B	⏳ Planned	Fattah + React/TypeScript specialization
Fattah-Coding-MoE	⏳ Planned	MoE with LLM-gated routing between Python + React experts

Training Infrastructure

GPUs: 2× NVIDIA A6000 48GB
Framework: ms-swift 4.0.2
Distributed: DeepSpeed ZeRO Stage 1
Attention: Flash Attention 2.3.6
Mixed precision: bfloat16
Total compute: ~60 GPU-hours (CPT) + ~19 GPU-hours (SFT)

Citation

If you use Fattah in your research, please cite:

@misc{fattah2026,
  title        = {Fattah: Egyptian Arabic LLM via Depth-Up Scaling and Continual Pre-Training},
  author       = {Belal},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/belal212/Fattah-2.5B-preview}},
  note         = {Pre-DPO version}
}

Acknowledgements

Qwen Team for the Qwen3-1.7B-Base model
MBZUAI-Paris for the Egyptian-SFT-Mixture dataset and NileChat benchmarks
UBC-NLP for the NileChat pre-training corpus
ms-swift for the training framework

فتاح — بيفتح أبواب الذكاء الاصطناعي للعربي المصري
Fattah — Opening the doors of AI for Egyptian Arabic speakers

Downloads last month: 343

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for belal212/Fattah-2.5B-preview

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

(309)

this model

Datasets used to train belal212/Fattah-2.5B-preview

Spaces using belal212/Fattah-2.5B-preview 2

Evaluation results

EgyptianMMLU (acc) on EgyptianMMLU
self-reported

38.400
EgyptianPIQA (acc) on EgyptianPIQA
self-reported

61.300
Belebele-Arz (acc) on Belebele-Arz
self-reported

40.780
EgyptianHellaSwag (acc_norm) on EgyptianHellaSwag
self-reported

24.000
EgyptianWinoGrande (acc) on EgyptianWinoGrande
self-reported

49.400
EgyptianOpenBookQA (acc) on EgyptianOpenBookQA
self-reported

27.960