| --- |
| library_name: transformers |
| tags: |
| - readability |
| license: mit |
| base_model: |
| - CAMeL-Lab/bert-base-arabic-camelbert-msa |
| pipeline_tag: text-classification |
| --- |
| # CAMeLBERT+Word+CE Readability Model |
|
|
| ## Model description |
| **CAMeLBERT+Word+CE** is a readability assessment model that was built by fine-tuning the **CAMeLBERT-msa** model with cross-entropy loss (**CE**). |
| For the fine-tuning, we used the **Word** input variant from [BAREC-Corpus-v1.0](https://huggingface.co/datasets/CAMeL-Lab/BAREC-Corpus-v1.0). |
| Our fine-tuning procedure and the hyperparameters we used can be found in our paper *"[A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment](https://arxiv.org/abs/2502.13520)."* |
|
|
| ## Intended uses |
| You can use the CAMeLBERT+Word+CE model as part of the transformers pipeline. |
|
|
| ## How to use |
| To use the model with a transformers pipeline: |
|
|
| ```python |
| >>> from transformers import pipeline |
| >>> readability = pipeline("text-classification", model="CAMeL-Lab/readability-camelbert-word-CE") |
| >>> text = 'و قال له انه يحب اكل الطعام بكثره' |
| >>> readability_level = int(readability(text)[0]['label'][6:])+1 |
| >>> print("readability level: {}".format(readability_level)) |
| readability level: 10 |
| ``` |
|
|
| ## Citation |
| ```bibtex |
| @inproceedings{elmadani-etal-2025-readability, |
| title = "A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment", |
| author = "Elmadani, Khalid N. and |
| Habash, Nizar and |
| Taha-Thomure, Hanada", |
| booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", |
| year = "2025", |
| address = "Vienna, Austria", |
| publisher = "Association for Computational Linguistics" |
| } |
| ``` |