CAMeL-Lab
/

readability-camelbert-word-CE

Text Classification

text-embeddings-inference

Model card Files Files and versions

readability-camelbert-word-CE / README.md

KhalidNElmadani's picture

KhalidNElmadani

Update README.md

96d3ca0 verified 9 months ago

|

history blame contribute delete

1.7 kB

	---
	library_name: transformers
	tags:
	- readability
	license: mit
	base_model:
	- CAMeL-Lab/bert-base-arabic-camelbert-msa
	pipeline_tag: text-classification
	---
	# CAMeLBERT+Word+CE Readability Model

	## Model description
	CAMeLBERT+Word+CE is a readability assessment model that was built by fine-tuning the CAMeLBERT-msa model with cross-entropy loss (CE).
	For the fine-tuning, we used the Word input variant from [BAREC-Corpus-v1.0](https://huggingface.co/datasets/CAMeL-Lab/BAREC-Corpus-v1.0).
	Our fine-tuning procedure and the hyperparameters we used can be found in our paper "[A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment](https://arxiv.org/abs/2502.13520)."

	## Intended uses
	You can use the CAMeLBERT+Word+CE model as part of the transformers pipeline.

	## How to use
	To use the model with a transformers pipeline:

	```python
	>>> from transformers import pipeline
	>>> readability = pipeline("text-classification", model="CAMeL-Lab/readability-camelbert-word-CE")
	>>> text = 'و قال له انه يحب اكل الطعام بكثره'
	>>> readability_level = int(readability(text)[0]['label'][6:])+1
	>>> print("readability level: {}".format(readability_level))
	readability level: 10
	```

	## Citation
	```bibtex
	@inproceedings{elmadani-etal-2025-readability,
	title = "A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment",
	author = "Elmadani, Khalid N. and
	Habash, Nizar and
	Taha-Thomure, Hanada",
	booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
	year = "2025",
	address = "Vienna, Austria",
	publisher = "Association for Computational Linguistics"
	}
	```