Initial upload (auto-create if missing)

340cedb verified 4 months ago

4.65 kB

	---
	language:
	- id
	- ace
	- ban
	- bjn
	- bug
	- jav
	- mad
	- min
	- sun
	- bbc
	- eng
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- text-classification
	- hate-speech-detection
	- abusive-language-detection
	- multilabel-classification
	- indonesian
	- multilingual
	- social-media
	- natural-language-processing
	- xlm-roberta
	license: apache-2.0
	metrics:
	- accuracy
	- f1
	base_model:
	- FacebookAI/xlm-roberta-base
	---

	# Hate Speech & Abusive Language Detection (Multilabel)
	Multilingual Indonesian & English — XLM-RoBERTa

	This repository provides a fine-tuned XLM-RoBERTa model for MULTILABEL HATE CONTENT DETECTION in social media text.
	The model is designed to identify Hate Speech and Abusive Language simultaneously across Indonesian, regional Indonesian languages, and English, particularly in noisy and informal online conversations.

	---

	## 🚀 Highlights

	- Multilabel classification: Hate Speech & Abusive Language
	- Supports overlapping labels in a single text
	- Multilingual (Indonesia + English)
	- Robust on informal and user-generated content
	- Ready-to-use with Hugging Face `pipeline`
	- Suitable for content moderation and safety systems

	---

	## 🌍 Supported Languages

	- 🇮🇩 Bahasa Indonesia
	- Bahasa Melayu
	- Indonesian regional languages (Aceh, Banjar, Bugis, Jawa, Madura, Minang, Sunda, dll.)
	- 🇬🇧 English

	---

	## 📊 Model Performance

	> Performance metrics are reported on a held-out validation set.

	\| Metric \| Score \|
	\|-----------------\|--------\|
	\| Precision \| 0.9249 \|
	\| Recal \| 0.9300 \|
	\| F1 (Macro) \| 0.9274 \|
	\| F1 (Weighted) \| 0.9269 \|
	\| Training Loss \| 0.1181 \|
	\| Validation Loss \| 0.2070 \|

	(Exact scores may vary depending on evaluation split and threshold.)

	---

	## ⚙️ Usage

	### Installation
	```bash
	pip install transformers torch
	````

	### Single Prediction

	```python
	from transformers import pipeline

	classifier = pipeline(
	task="text-classification",
	model="nahiar/hatespeech-abusive-xlm-roberta-v1",
	return_all_scores=True
	)

	result = classifier("Dasar bodoh, otak udang!")
	print(result)
	```

	Output

	```text
	[
	{'label': 'HATESPEECH', 'score': 0.9123},
	{'label': 'ABUSIVE', 'score': 0.9841}
	]
	```

	> Because this is a multilabel model, more than one label can be active for a single input.

	---

	## 🏷️ Label Definitions

	```text
	HATESPEECH → Content that attacks or demeans a group based on identity
	ABUSIVE → Insulting, offensive, or aggressive language without protected targets
	```

	---

	## 📦 Batch Inference

	```python
	texts = [
	"Dasar kaum ini selalu bikin rusuh",
	"Kamu memang bodoh dan tidak berguna",
	"Saya tidak setuju dengan pendapat kamu"
	]

	results = classifier(texts)

	for text, preds in zip(texts, results):
	labels = [(p["label"], round(p["score"], 4)) for p in preds]
	print(text, "→", labels)
	```

	---

	## 🏗️ Training Configuration

	\| Parameter \| Value \|
	\| ----------------- \| ------------------------- \|
	\| Base Model \| xlm-roberta-base \|
	\| Task Type \| Multilabel Classification \|
	\| Training Strategy \| Fine-tuning \|
	\| Epochs \| Multiple \|
	\| Learning Rate \| 2e-5 \|
	\| Batch Size \| 16 \|
	\| Training Date \| 2025-12-18 \|

	---

	## 🎯 Intended Use

	* Hate speech & abusive language moderation
	* Content safety and compliance systems
	* Social media monitoring dashboards
	* Pre-filtering before sentiment or topic analysis

	---

	## ⚠️ Limitations

	* Limited to Hate Speech and Abusive Language labels
	* Does not identify specific hate targets or protected attributes
	* Context-dependent sarcasm may be misclassified
	* Not suitable for legal or policy enforcement without human review

	---

	## 📜 License

	This model is released under the Apache License 2.0
	Free for research and commercial use.

	---

	## 📚 Citation

	```bibtex
	@misc{djunaedi2025hatespeech_multilabel,
	author = {Raihan Hidayatulloh Djunaedi},
	title = {Multilabel Hate Speech and Abusive Language Detection for Social Media Text},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/nahiar/hatespeech-xlmr-v4}
	}
	```

	---

	## 🙌 Acknowledgements

	* Hugging Face Transformers
	* Facebook AI Research — XLM-RoBERTa