vzani commited on
Commit
83fbe70
·
verified ·
1 Parent(s): ead6097

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -26
README.md CHANGED
@@ -67,33 +67,29 @@ model-index:
67
 
68
  ## Model Overview
69
 
70
- This repository contains **BiLSTM** models for **fake news detection in Portuguese**.
71
- Models are trained and evaluated on corpora derived from Brazilian Portuguese dataset **[Fake.br](https://github.com/roneysco/Fake.br-Corpus)**.
72
 
73
  - **Architecture**: Bidirectional LSTM (Keras)
74
  - **Task**: Binary text classification (Fake vs. True)
75
  - **Language**: Portuguese (`pt`)
76
  - **Framework**: Keras / TensorFlow
 
77
 
78
  ---
79
 
80
  ## Available Variants
81
 
82
- - **bilstm-combined**
83
- Fine-tuned on the aligned combined corpus (`data/corpus_*` at project root).
84
 
85
- - **bilstm-fake-br**
86
- Fine-tuned on **Fake.br**.
87
- The corresponding corpus is available in `corpus/` (including preprocessed and size-normalized texts when applicable).
88
 
89
- - **bilstm-faketrue-br**
90
- Fine-tuned on **FakeTrue.Br**.
91
- Includes aligned splits and the original CSV when available.
92
 
93
- Each variant ships with:
94
- - `confusion_matrix.png`
95
- - `final_classification_report.parquet`
96
- - `final_predictions.parquet`
97
 
98
  ---
99
 
@@ -127,16 +123,6 @@ These files provide per-class performance and prediction logs for reproducibilit
127
 
128
  ---
129
 
130
- ## Corpus
131
-
132
- The corpora used for training and evaluation are provided in the `corpus/` folder.
133
-
134
- - **Combined (root folder)**: `corpus_train_df.parquet`, `corpus_test_df.parquet`, `corpus_df.parquet`, `corpus_alinhado_df.parquet`.
135
- - **Fake.br**: `corpus_train_df.parquet`, `corpus_test_df.parquet`, `corpus_df.parquet`, `corpus_alinhado_df.parquet`.
136
- - **FakeTrue.Br**: `corpus_train_df.parquet`, `corpus_test_df.parquet`, `corpus_df.parquet`, `corpus_alinhado_df.parquet` and `FakeTrueBr_corpus.csv`.
137
-
138
- ---
139
-
140
  ## How to Use
141
 
142
  This model is a **Keras** model stored as `final_bilstm_model.keras`.
@@ -174,10 +160,30 @@ The expected output is a Tuple where the first entry represents the classificati
174
  (False, 0.997499808203429)
175
  ```
176
 
 
 
 
 
 
 
 
 
177
  ## License
178
 
179
- [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 
180
 
181
  ## Citation
182
 
183
- Coming soon.
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  ## Model Overview
69
 
70
+ This repository contains a trained **BiLSTM** model for **fake news detection in Portuguese**.
71
+ The model was trained and evaluated on corpora derived from Brazilian Portuguese dataset **[Fake.br](https://github.com/roneysco/Fake.br-Corpus)**.
72
 
73
  - **Architecture**: Bidirectional LSTM (Keras)
74
  - **Task**: Binary text classification (Fake vs. True)
75
  - **Language**: Portuguese (`pt`)
76
  - **Framework**: Keras / TensorFlow
77
+ - **Training source code**: https://github.com/viniciuszani/portuguese-fake-new-classifiers
78
 
79
  ---
80
 
81
  ## Available Variants
82
 
83
+ - [**bilstm-combined**](https://huggingface.co/vzani/portuguese-fake-news-classifier-bilstm-combined)
84
+ Fine-tuned using the [combined dataset](https://huggingface.co/datasets/vzani/corpus-combined) from Fake.br and FakeTrue.Br.
85
 
86
+ - [**bilstm-fake-br**](https://huggingface.co/vzani/portuguese-fake-news-classifier-bilstm-fake-br)
87
+ Fine-tuned using the [Fake.br dataset](https://huggingface.co/datasets/vzani/corpus-fake-br) from Fake.br.
 
88
 
89
+ - [**bilstm-faketrue-br**](https://huggingface.co/vzani/portuguese-fake-news-classifier-bilstm-faketrue-br)
90
+ Fine-tuned using the [FakeTrue.Br dataset](https://huggingface.co/datasets/vzani/corpus-faketrue-br) from FakeTrue.Br.
 
91
 
92
+ Each variant has its own confusion matrix, classification report, and predictions stored as artifacts.
 
 
 
93
 
94
  ---
95
 
 
123
 
124
  ---
125
 
 
 
 
 
 
 
 
 
 
 
126
  ## How to Use
127
 
128
  This model is a **Keras** model stored as `final_bilstm_model.keras`.
 
160
  (False, 0.997499808203429)
161
  ```
162
 
163
+ ## Source code
164
+
165
+ You can find the source code that produced this model in the repository below:
166
+ - https://github.com/viniciuszani/portuguese-fake-new-classifiers
167
+
168
+ The source contains all the steps from data collection, evaluation, hyperparameter fine tuning, final model tuning and publishing to HuggingFace.
169
+ If you use it, please remember to credit the author and/or cite the work.
170
+
171
  ## License
172
 
173
+ - Base model BERTimbau: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
174
+ - Fine-tuned models and corpora: Released under the same license for academic and research use.
175
 
176
  ## Citation
177
 
178
+ ```bibtex
179
+ @misc{zani2025portuguesefakenews,
180
+ author = {ZANI, Vinícius Augusto Tagliatti},
181
+ title = {Avaliação comparativa de técnicas de processamento de linguagem natural para a detecção de notícias falsas em Português},
182
+ year = {2025},
183
+ pages = {61},
184
+ address = {São Carlos},
185
+ school = {Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo},
186
+ type = {Trabalho de Conclusão de Curso (MBA em Inteligência Artificial e Big Data)},
187
+ note = {Orientador: Prof. Dr. Ivandre Paraboni}
188
+ }
189
+ ```