nahiar's picture
Initial upload (auto-create if missing)
340cedb verified
---
language:
- id
- ace
- ban
- bjn
- bug
- jav
- mad
- min
- sun
- bbc
- eng
library_name: transformers
pipeline_tag: text-classification
tags:
- text-classification
- hate-speech-detection
- abusive-language-detection
- multilabel-classification
- indonesian
- multilingual
- social-media
- natural-language-processing
- xlm-roberta
license: apache-2.0
metrics:
- accuracy
- f1
base_model:
- FacebookAI/xlm-roberta-base
---
# Hate Speech & Abusive Language Detection (Multilabel)
**Multilingual Indonesian & English โ€” XLM-RoBERTa**
This repository provides a fine-tuned **XLM-RoBERTa** model for **MULTILABEL HATE CONTENT DETECTION** in social media text.
The model is designed to identify **Hate Speech** and **Abusive Language** simultaneously across **Indonesian**, **regional Indonesian languages**, and **English**, particularly in noisy and informal online conversations.
---
## ๐Ÿš€ Highlights
- Multilabel classification: **Hate Speech** & **Abusive Language**
- Supports overlapping labels in a single text
- Multilingual (Indonesia + English)
- Robust on informal and user-generated content
- Ready-to-use with Hugging Face `pipeline`
- Suitable for content moderation and safety systems
---
## ๐ŸŒ Supported Languages
- ๐Ÿ‡ฎ๐Ÿ‡ฉ Bahasa Indonesia
- Bahasa Melayu
- Indonesian regional languages (Aceh, Banjar, Bugis, Jawa, Madura, Minang, Sunda, dll.)
- ๐Ÿ‡ฌ๐Ÿ‡ง English
---
## ๐Ÿ“Š Model Performance
> Performance metrics are reported on a held-out validation set.
| Metric | Score |
|-----------------|--------|
| Precision | 0.9249 |
| Recal | 0.9300 |
| F1 (Macro) | 0.9274 |
| F1 (Weighted) | 0.9269 |
| Training Loss | 0.1181 |
| Validation Loss | 0.2070 |
*(Exact scores may vary depending on evaluation split and threshold.)*
---
## โš™๏ธ Usage
### Installation
```bash
pip install transformers torch
````
### Single Prediction
```python
from transformers import pipeline
classifier = pipeline(
task="text-classification",
model="nahiar/hatespeech-abusive-xlm-roberta-v1",
return_all_scores=True
)
result = classifier("Dasar bodoh, otak udang!")
print(result)
```
**Output**
```text
[
{'label': 'HATESPEECH', 'score': 0.9123},
{'label': 'ABUSIVE', 'score': 0.9841}
]
```
> Because this is a **multilabel model**, more than one label can be active for a single input.
---
## ๐Ÿท๏ธ Label Definitions
```text
HATESPEECH โ†’ Content that attacks or demeans a group based on identity
ABUSIVE โ†’ Insulting, offensive, or aggressive language without protected targets
```
---
## ๐Ÿ“ฆ Batch Inference
```python
texts = [
"Dasar kaum ini selalu bikin rusuh",
"Kamu memang bodoh dan tidak berguna",
"Saya tidak setuju dengan pendapat kamu"
]
results = classifier(texts)
for text, preds in zip(texts, results):
labels = [(p["label"], round(p["score"], 4)) for p in preds]
print(text, "โ†’", labels)
```
---
## ๐Ÿ—๏ธ Training Configuration
| Parameter | Value |
| ----------------- | ------------------------- |
| Base Model | xlm-roberta-base |
| Task Type | Multilabel Classification |
| Training Strategy | Fine-tuning |
| Epochs | Multiple |
| Learning Rate | 2e-5 |
| Batch Size | 16 |
| Training Date | 2025-12-18 |
---
## ๐ŸŽฏ Intended Use
* Hate speech & abusive language moderation
* Content safety and compliance systems
* Social media monitoring dashboards
* Pre-filtering before sentiment or topic analysis
---
## โš ๏ธ Limitations
* Limited to **Hate Speech** and **Abusive Language** labels
* Does not identify specific hate targets or protected attributes
* Context-dependent sarcasm may be misclassified
* Not suitable for legal or policy enforcement without human review
---
## ๐Ÿ“œ License
This model is released under the **Apache License 2.0**
Free for research and commercial use.
---
## ๐Ÿ“š Citation
```bibtex
@misc{djunaedi2025hatespeech_multilabel,
author = {Raihan Hidayatulloh Djunaedi},
title = {Multilabel Hate Speech and Abusive Language Detection for Social Media Text},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/nahiar/hatespeech-xlmr-v4}
}
```
---
## ๐Ÿ™Œ Acknowledgements
* Hugging Face Transformers
* Facebook AI Research โ€” XLM-RoBERTa