deep-ignorance-unfiltered_unlearned_npo

This model was created by fine-tuning EleutherAI/deep-ignorance-unfiltered using the Negative Preference Optimization unlearning algorithm. The method is based on Zhang et al. 2024. The goal of unlearning is to remove specific knowledge from a pretrained language model while preserving its general capabilities.

Hyperparameters

Parameter Value
Base model EleutherAI/deep-ignorance-unfiltered
Unlearning method Negative Preference Optimization
Learning rate 4.5e-05
Epochs 1
Batch size 32
Max sequence length 512
Optimizer adamw
Gradient clipping 1.0
Gradient accumulation steps 1
Seed 42
W&B / run name npo__ep1_lr4.5e-05_bs32_b0.01_rw1.5_mle512_mli8192
Beta 0.01
Retain weight 1.5

Evaluation Results

Benchmark Value
mmlu / acc 0.4405
mmlu / acc_stderr 0.0041
mmlu_abstract_algebra / acc 0.3300
mmlu_abstract_algebra / acc_stderr 0.0473
mmlu_anatomy / acc 0.4667
mmlu_anatomy / acc_stderr 0.0431
mmlu_astronomy / acc 0.4671
mmlu_astronomy / acc_stderr 0.0406
mmlu_business_ethics / acc 0.3800
mmlu_business_ethics / acc_stderr 0.0488
mmlu_clinical_knowledge / acc 0.4604
mmlu_clinical_knowledge / acc_stderr 0.0307
mmlu_college_biology / acc 0.3958
mmlu_college_biology / acc_stderr 0.0409
mmlu_college_chemistry / acc 0.2900
mmlu_college_chemistry / acc_stderr 0.0456
mmlu_college_computer_science / acc 0.3600
mmlu_college_computer_science / acc_stderr 0.0482
mmlu_college_mathematics / acc 0.2700
mmlu_college_mathematics / acc_stderr 0.0446
mmlu_college_medicine / acc 0.4393
mmlu_college_medicine / acc_stderr 0.0378
mmlu_college_physics / acc 0.2549
mmlu_college_physics / acc_stderr 0.0434
mmlu_computer_security / acc 0.6100
mmlu_computer_security / acc_stderr 0.0490
mmlu_conceptual_physics / acc 0.4170
mmlu_conceptual_physics / acc_stderr 0.0322
mmlu_econometrics / acc 0.2544
mmlu_econometrics / acc_stderr 0.0410
mmlu_electrical_engineering / acc 0.4759
mmlu_electrical_engineering / acc_stderr 0.0416
mmlu_elementary_mathematics / acc 0.2857
mmlu_elementary_mathematics / acc_stderr 0.0233
mmlu_formal_logic / acc 0.2302
mmlu_formal_logic / acc_stderr 0.0376
mmlu_global_facts / acc 0.3400
mmlu_global_facts / acc_stderr 0.0476
mmlu_high_school_biology / acc 0.4581
mmlu_high_school_biology / acc_stderr 0.0283
mmlu_high_school_chemistry / acc 0.3251
mmlu_high_school_chemistry / acc_stderr 0.0330
mmlu_high_school_computer_science / acc 0.4800
mmlu_high_school_computer_science / acc_stderr 0.0502
mmlu_high_school_european_history / acc 0.5879
mmlu_high_school_european_history / acc_stderr 0.0384
mmlu_high_school_geography / acc 0.5808
mmlu_high_school_geography / acc_stderr 0.0352
mmlu_high_school_government_and_politics / acc 0.6114
mmlu_high_school_government_and_politics / acc_stderr 0.0352
mmlu_high_school_macroeconomics / acc 0.3590
mmlu_high_school_macroeconomics / acc_stderr 0.0243
mmlu_high_school_mathematics / acc 0.2667
mmlu_high_school_mathematics / acc_stderr 0.0270
mmlu_high_school_microeconomics / acc 0.4076
mmlu_high_school_microeconomics / acc_stderr 0.0319
mmlu_high_school_physics / acc 0.2649
mmlu_high_school_physics / acc_stderr 0.0360
mmlu_high_school_psychology / acc 0.5835
mmlu_high_school_psychology / acc_stderr 0.0211
mmlu_high_school_statistics / acc 0.2778
mmlu_high_school_statistics / acc_stderr 0.0305
mmlu_high_school_us_history / acc 0.5637
mmlu_high_school_us_history / acc_stderr 0.0348
mmlu_high_school_world_history / acc 0.6160
mmlu_high_school_world_history / acc_stderr 0.0317
mmlu_human_aging / acc 0.4843
mmlu_human_aging / acc_stderr 0.0335
mmlu_human_sexuality / acc 0.5802
mmlu_human_sexuality / acc_stderr 0.0433
mmlu_humanities / acc 0.4183
mmlu_humanities / acc_stderr 0.0069
mmlu_international_law / acc 0.6281
mmlu_international_law / acc_stderr 0.0441
mmlu_jurisprudence / acc 0.5648
mmlu_jurisprudence / acc_stderr 0.0479
mmlu_logical_fallacies / acc 0.4847
mmlu_logical_fallacies / acc_stderr 0.0393
mmlu_machine_learning / acc 0.3125
mmlu_machine_learning / acc_stderr 0.0440
mmlu_management / acc 0.6311
mmlu_management / acc_stderr 0.0478
mmlu_marketing / acc 0.6368
mmlu_marketing / acc_stderr 0.0315
mmlu_medical_genetics / acc 0.4900
mmlu_medical_genetics / acc_stderr 0.0502
mmlu_miscellaneous / acc 0.6245
mmlu_miscellaneous / acc_stderr 0.0173
mmlu_moral_disputes / acc 0.4913
mmlu_moral_disputes / acc_stderr 0.0269
mmlu_moral_scenarios / acc 0.2469
mmlu_moral_scenarios / acc_stderr 0.0144
mmlu_nutrition / acc 0.5000
mmlu_nutrition / acc_stderr 0.0286
mmlu_other / acc 0.4963
mmlu_other / acc_stderr 0.0087
mmlu_philosophy / acc 0.5177
mmlu_philosophy / acc_stderr 0.0284
mmlu_prehistory / acc 0.5154
mmlu_prehistory / acc_stderr 0.0278
mmlu_professional_accounting / acc 0.3617
mmlu_professional_accounting / acc_stderr 0.0287
mmlu_professional_law / acc 0.3462
mmlu_professional_law / acc_stderr 0.0122
mmlu_professional_medicine / acc 0.4596
mmlu_professional_medicine / acc_stderr 0.0303
mmlu_professional_psychology / acc 0.4395
mmlu_professional_psychology / acc_stderr 0.0201
mmlu_public_relations / acc 0.5091
mmlu_public_relations / acc_stderr 0.0479
mmlu_security_studies / acc 0.4735
mmlu_security_studies / acc_stderr 0.0320
mmlu_social_sciences / acc 0.4985
mmlu_social_sciences / acc_stderr 0.0088
mmlu_sociology / acc 0.6517
mmlu_sociology / acc_stderr 0.0337
mmlu_stem / acc 0.3619
mmlu_stem / acc_stderr 0.0084
mmlu_us_foreign_policy / acc 0.6900
mmlu_us_foreign_policy / acc_stderr 0.0465
mmlu_virology / acc 0.1928
mmlu_virology / acc_stderr 0.0307
mmlu_world_religions / acc 0.6725
mmlu_world_religions / acc_stderr 0.0360
wikitext / bits_per_byte 0.6634
wikitext / bits_per_byte_stderr N/A
wikitext / byte_perplexity 1.5838
wikitext / byte_perplexity_stderr N/A
wikitext / word_perplexity 11.6926
wikitext / word_perplexity_stderr N/A
wmdp_bio_categorized_mcqa / acc 0.2820
wmdp_bio_categorized_mcqa / acc_stderr 0.0125
wmdp_bio_cloze_verified / acc_norm 0.2621
wmdp_bio_cloze_verified / acc_norm_stderr 0.0134
wmdp_bio_robust / acc 0.2719
wmdp_bio_robust / acc_stderr 0.0151
wmdp_bio_robust_bioweapons_and_bioterrorism / acc 0.2737
wmdp_bio_robust_bioweapons_and_bioterrorism / acc_stderr 0.0324
wmdp_bio_robust_dual_use_virology / acc 0.3214
wmdp_bio_robust_dual_use_virology / acc_stderr 0.0899
wmdp_bio_robust_enhanced_potential_pandemic_pathogens / acc 0.2157
wmdp_bio_robust_enhanced_potential_pandemic_pathogens / acc_stderr 0.0409
wmdp_bio_robust_expanding_access_to_threat_vectors / acc 0.3333
wmdp_bio_robust_expanding_access_to_threat_vectors / acc_stderr 0.1054
wmdp_bio_robust_reverse_genetics_and_easy_editing / acc 0.2473
wmdp_bio_robust_reverse_genetics_and_easy_editing / acc_stderr 0.0317
wmdp_bio_robust_rewritten / acc 0.2626
wmdp_bio_robust_rewritten / acc_stderr 0.0089
wmdp_bio_robust_rewritten_gibberish / acc 0.2639
wmdp_bio_robust_rewritten_gibberish / acc_stderr 0.0155
wmdp_bio_robust_rewritten_nonsensical_biology / acc 0.2725
wmdp_bio_robust_rewritten_nonsensical_biology / acc_stderr 0.0156
wmdp_bio_robust_rewritten_real_words_sciency / acc 0.2515
wmdp_bio_robust_rewritten_real_words_sciency / acc_stderr 0.0152
wmdp_bio_robust_viral_vector_research / acc 0.2933
wmdp_bio_robust_viral_vector_research / acc_stderr 0.0247
wmdp_bio_shortcut / acc 0.3037
wmdp_bio_shortcut / acc_stderr 0.0223
wmdp_bio_shortcut_bioweapons_and_bioterrorism / acc 0.5745
wmdp_bio_shortcut_bioweapons_and_bioterrorism / acc_stderr 0.0729
wmdp_bio_shortcut_dual_use_virology / acc 0.4211
wmdp_bio_shortcut_dual_use_virology / acc_stderr 0.1164
wmdp_bio_shortcut_enhanced_potential_pandemic_pathogens / acc 0.2075
wmdp_bio_shortcut_enhanced_potential_pandemic_pathogens / acc_stderr 0.0562
wmdp_bio_shortcut_expanding_access_to_threat_vectors / acc 0.5556
wmdp_bio_shortcut_expanding_access_to_threat_vectors / acc_stderr 0.1757
wmdp_bio_shortcut_reverse_genetics_and_easy_editing / acc 0.2353
wmdp_bio_shortcut_reverse_genetics_and_easy_editing / acc_stderr 0.0463
wmdp_bio_shortcut_viral_vector_research / acc 0.2708
wmdp_bio_shortcut_viral_vector_research / acc_stderr 0.0322
Downloads last month
40
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for girishgupta/deep-ignorance-unfiltered_unlearned_npo

Unable to build the model tree, the base model loops to the model itself. Learn more.

Collection including girishgupta/deep-ignorance-unfiltered_unlearned_npo

Paper for girishgupta/deep-ignorance-unfiltered_unlearned_npo