Political Persuasion Hybrid Model
This repository contains a runnable hybrid multi-label classifier for rhetorical strategy detection in political speech segments.
It predicts six labels:
emotion_appealauthority_appealpolarizationpresumptionexaggerationrhetorical_framing
Model Description
This is a hybrid architecture that combines:
- TF-IDF features (
1-2grams), and - DistilBERT embeddings (
distilbert-base-uncased, mean pooled),
then trains one RandomForest classifier per label with label-specific thresholds.
Files in This Repo
artifacts/tfidf_vectorizer.joblibartifacts/classifier_<label>.joblib(one per label)artifacts/thresholds.jsonartifacts/metadata.jsonhybrid_results.csvresults.jsontest_predictions.csv
Intended Use
Use this model for research and analysis of rhetorical patterns in political language.
Limitations
- Built from a specific political speech dataset and taxonomy.
- Performance may not generalize to other domains/languages without re-training.
- Should not be used as a sole basis for high-stakes decisions.
Metrics (latest run)
See:
results.json(summary + per-label metrics)hybrid_results.csv(per-label threshold/F1/precision/recall)
How to Run Inference
Inference CLI lives in the project codebase:
python scripts/infer_hybrid_model.py \
--artifacts-dir results/hybrid_YYYYMMDD_HHMMSS/artifacts \
--text "We must act now to protect our families."
CSV batch mode:
python scripts/infer_hybrid_model.py \
--artifacts-dir results/hybrid_YYYYMMDD_HHMMSS/artifacts \
--input-csv dataset/dataset_for_annotation.csv \
--text-column text \
--output results/hybrid_predictions.csv
Training Data
Related dataset repository:
sofiagzzloz/political-persuasion-dataset
Citation
If you use this model, please cite your project/report and reference this repository.