Political Persuasion Hybrid Model

This repository contains a runnable hybrid multi-label classifier for rhetorical strategy detection in political speech segments.

It predicts six labels:

emotion_appeal
authority_appeal
polarization
presumption
exaggeration
rhetorical_framing

Model Description

This is a hybrid architecture that combines:

TF-IDF features (1-2 grams), and
DistilBERT embeddings (distilbert-base-uncased, mean pooled),

then trains one RandomForest classifier per label with label-specific thresholds.

Files in This Repo

artifacts/tfidf_vectorizer.joblib
artifacts/classifier_<label>.joblib (one per label)
artifacts/thresholds.json
artifacts/metadata.json
hybrid_results.csv
results.json
test_predictions.csv

Intended Use

Use this model for research and analysis of rhetorical patterns in political language.

Limitations

Built from a specific political speech dataset and taxonomy.
Performance may not generalize to other domains/languages without re-training.
Should not be used as a sole basis for high-stakes decisions.

Metrics (latest run)

See:

results.json (summary + per-label metrics)
hybrid_results.csv (per-label threshold/F1/precision/recall)

How to Run Inference

Inference CLI lives in the project codebase:

python scripts/infer_hybrid_model.py \
  --artifacts-dir results/hybrid_YYYYMMDD_HHMMSS/artifacts \
  --text "We must act now to protect our families."

CSV batch mode:

python scripts/infer_hybrid_model.py \
  --artifacts-dir results/hybrid_YYYYMMDD_HHMMSS/artifacts \
  --input-csv dataset/dataset_for_annotation.csv \
  --text-column text \
  --output results/hybrid_predictions.csv

Training Data

Related dataset repository:

sofiagzzloz/political-persuasion-dataset

Citation

If you use this model, please cite your project/report and reference this repository.

Downloads last month: -; Downloads are not tracked for this model. How to track