CrashRisk-QLD β Fatal binary classifier
LightGBM binary classifier estimating P(fatal) given pre-/at-crash conditions in a Queensland road crash. Severe class imbalance (1.90% positive on the held-out test year); we evaluate primarily on PR-AUC and report a full threshold sweep.
Β© State of Queensland (Department of Transport and Main Roads) 2026. Licensed CC-BY 4.0. Source: data.qld.gov.au, dataset 'Crash data from Queensland roads', version rqC45037 (2025-06), retrieved 2026-04-30.
Disclaimer. This is a population-level statistical model trained on publicly reported crash data. It is NOT suitable for individual driver risk assessment, insurance underwriting, pre-incident law enforcement targeting, or any decision with legal or financial consequence to an individual. Use it for research, road-safety analysis, and education.
Quick start
from huggingface_hub import hf_hub_download
import joblib, lightgbm as lgb
REPO = "Mattysmittttt/crashrisk-qld-fatal"
booster = lgb.Booster(model_file=hf_hub_download(REPO, "model.txt"))
pre = joblib.load(hf_hub_download(REPO, "preprocessor.joblib"))
# pre.transform(X) β booster.predict(...) β P(fatal)
Intended uses
Same as the multi-class severity head: research, education, descriptive analysis. Pair the two heads β the severity head gives a full distribution, the fatal head gives a sharper imbalanced-binary signal.
Out of scope
- Insurance underwriting β using population-level statistical patterns to set individual premiums creates fairness concerns and is outside this model's intended scope.
- Individual driver risk assessment β these features describe road conditions and aggregate vehicle context, not driver behaviour or identity.
- Pre-incident law enforcement targeting β geographic patterns may reflect reporting biases as much as actual risk; using them to pre-target locations creates feedback loops.
- Any decision with legal or financial consequence to a single individual β full stop.
Training data
See the dataset card.
- Train: 152,842
- Test: 14,358 (positives: 273, rate: 1.901%)
Training details
- Objective:
binarylog-loss (noscale_pos_weightβ at ~50Γ it caused early stopping at iteration 1; imbalance is handled at threshold-selection time via the threshold sweep instead). - Optuna: 20 trials, TPE sampler, early stopping on val log-loss
- Best iteration: 23
- Best params:
{
"num_leaves": 83,
"learning_rate": 0.05082341959721458,
"feature_fraction": 0.6563696899899051,
"bagging_fraction": 0.9208787923016158,
"bagging_freq": 0,
"min_data_in_leaf": 480,
"lambda_l1": 0.08916674715636552,
"lambda_l2": 6.143857495033091e-07
}
- Features after preprocessing: 155
Evaluation (held-out test = 2024)
- PR-AUC (primary): 0.0737 [95% CI 0.0579, 0.0958]
- ROC-AUC: 0.7858 [0.7623, 0.8093] (sanity check: must NOT exceed 0.95 β observed: ok)
- Best-F1 operating point: precision = 0.092, recall = 0.282, F1 = 0.139 (threshold = 0.0620; 95% F1 CI 0.112, 0.165)
- Recall @ Precision = 0.20: 0.026 (threshold = 0.1744)
- Recall @ Precision = 0.50: 0.007 (threshold = 0.2448)
Threshold sweep (compact)
Full table on the artifact: threshold_table.csv.
| threshold | precision | recall | f1 |
|---|---|---|---|
| 0.006 | 0.019 | 1.000 | 0.037 |
| 0.007 | 0.020 | 0.996 | 0.040 |
| 0.007 | 0.021 | 0.993 | 0.042 |
| 0.007 | 0.022 | 0.989 | 0.043 |
| 0.007 | 0.023 | 0.989 | 0.045 |
| 0.008 | 0.024 | 0.985 | 0.047 |
| 0.008 | 0.025 | 0.982 | 0.048 |
| 0.008 | 0.025 | 0.974 | 0.050 |
| 0.008 | 0.026 | 0.974 | 0.051 |
| 0.008 | 0.027 | 0.974 | 0.053 |
| 0.008 | 0.028 | 0.974 | 0.054 |
| 0.009 | 0.028 | 0.963 | 0.055 |
| 0.009 | 0.029 | 0.960 | 0.056 |
| 0.009 | 0.030 | 0.956 | 0.058 |
| 0.009 | 0.030 | 0.941 | 0.059 |
| 0.010 | 0.031 | 0.930 | 0.060 |
| 0.010 | 0.032 | 0.927 | 0.061 |
| 0.010 | 0.032 | 0.916 | 0.062 |
| 0.010 | 0.033 | 0.912 | 0.064 |
| 0.011 | 0.034 | 0.901 | 0.065 |
| 0.011 | 0.035 | 0.890 | 0.067 |
| 0.011 | 0.036 | 0.886 | 0.069 |
| 0.012 | 0.037 | 0.875 | 0.071 |
| 0.012 | 0.037 | 0.850 | 0.071 |
| 0.013 | 0.038 | 0.839 | 0.073 |
| 0.013 | 0.040 | 0.835 | 0.076 |
| 0.014 | 0.041 | 0.821 | 0.077 |
| 0.014 | 0.041 | 0.799 | 0.078 |
| 0.015 | 0.042 | 0.777 | 0.080 |
| 0.016 | 0.043 | 0.747 | 0.080 |
| 0.016 | 0.044 | 0.736 | 0.083 |
| 0.017 | 0.044 | 0.689 | 0.082 |
| 0.018 | 0.044 | 0.656 | 0.082 |
| 0.020 | 0.045 | 0.637 | 0.085 |
| 0.021 | 0.048 | 0.630 | 0.089 |
| 0.022 | 0.049 | 0.597 | 0.090 |
| 0.024 | 0.050 | 0.575 | 0.093 |
| 0.026 | 0.051 | 0.538 | 0.093 |
| 0.028 | 0.054 | 0.516 | 0.097 |
| 0.030 | 0.058 | 0.509 | 0.104 |
| 0.033 | 0.060 | 0.473 | 0.106 |
| 0.036 | 0.064 | 0.447 | 0.111 |
| 0.039 | 0.066 | 0.403 | 0.113 |
| 0.044 | 0.070 | 0.370 | 0.118 |
| 0.049 | 0.077 | 0.337 | 0.125 |
| 0.056 | 0.085 | 0.297 | 0.132 |
| 0.067 | 0.092 | 0.242 | 0.133 |
| 0.083 | 0.088 | 0.154 | 0.112 |
| 0.111 | 0.117 | 0.103 | 0.109 |
| 1.000 | 1.000 | 0.000 | 0.000 |
Pick a threshold for your use case rather than relying on the default 0.5 β the default is rarely optimal under heavy class imbalance.
Top 20 features (LightGBM gain)
loc_suburb(gain = 28660)count_unit_car(gain = 5619)crash_speed_limit(gain = 5061)loc_abs_statistical_area_2(gain = 4840)loc_post_code(gain = 4436)count_unit_pedestrian(gain = 2189)count_unit_motorcycle_moped(gain = 1875)crash_hour(gain = 1678)count_unit_truck(gain = 1453)crash_year(gain = 1014)loc_state_electorate(gain = 975)crash_lighting_condition_daylight(gain = 878)crash_latitude(gain = 748)loc_abs_statistical_area_3(gain = 619)crash_longitude(gain = 608)crash_road_horiz_align_straight(gain = 601)crash_roadway_feature_no roadway feature(gain = 554)loc_local_government_area(gain = 513)crash_road_horiz_align_curved - view open(gain = 471)crash_lighting_condition_darkness - not lighted(gain = 400)
SHAP explainability
See reports/shap_fatal/ in the source repository for global and local
SHAP plots. The same set of physical drivers (speed_limit, lighting,
surface, roadway_feature) consistently dominates, which is the
expected sanity-check signal.
Geographic surface
See reports/maps/model_lga_risk_surface.png (and the interactive HTML
in the same folder) for a per-LGA map of mean P(fatal) under a fixed
conditions grid.
Limitations & biases
Same set as the severity card. The binary head is more sensitive to under-reporting bias than the multi-class one because the positive class is small enough that a few mis-coded rows shift metrics noticeably.
Ethical considerations
- Population-level, not causal: The model encodes correlations between pre-crash conditions and recorded outcomes. It does not assign fault and cannot be read as a statement about individual responsibility.
- Geographic predictions can stigmatise: We publish per-LGA aggregates only, never per-address. Even at LGA level, higher predicted risk reflects historical reporting and demographics as much as it does inherent road danger.
- Demographic features deliberately excluded: We do not include gender, age, or any demographic field, even though some are present in the casualties aggregates. This is to avoid encoding protected-class proxies. Vehicle-type counts are kept because they describe the crash configuration, not the people involved.
- Reporting bias: This is a model of reported crashes, not true crashes. Under-reporting is differential by severity (PDO under-reported, fatal generally fully reported) and by region.
Citation
@software{crashrisk_qld_fatal_2026,
title = {CrashRisk-QLD fatal binary classifier},
author = {Mattysmittttt},
year = {2026},
url = {https://huggingface.co/Mattysmittttt/crashrisk-qld-fatal},
note = {Trained on Mattysmittttt/qld-traffic-crashes-clean; source data CC-BY 4.0 Β© State of Queensland (Department of Transport and Main Roads).}
}
License
Released under CC-BY 4.0.