Stratified-Datasets (100K-1M) [Pre-Training, IF, Reasoning] - a AmanPriyanshu Collection

AmanPriyanshu 's Collections

FORMAT: Search - Retrieve RLVR

FORMAT: Reasoning Datasets - DeepSeek Format

FORMAT: Tool-Use Datasets - Hermes-Reasoning-Tool-Use Format

Stratified-Datasets (100K-1M) [Pre-Training, IF, Reasoning]

GPT-OSS Pruned Experts (4.2B-20B) [IF, Science, Math, etc.]

GPT-OSS General (4.2B to 20B)

GPT-OSS Harmful (4.2B to 20B)

GPT-OSS Math (4.2B to 20B)

GPT-OSS Health / Medicine (4.2B to 20B)

GPT-OSS Law (4.2B to 20B)

GPT-OSS Instruction Following (4.2B to 20B)

GPT-OSS Safety (4.2B to 20B)

GPT-OSS Science (4.2B to 20B)

Stratified-Datasets (100K-1M) [Pre-Training, IF, Reasoning]

updated Oct 8, 2025

Diverse datasets on pre-training, instruction-following, and reasoning