pythia-160m-c4-english-ppt

Pythia-160M pre-pretrained on English (C4), then pretrained on C4.

Part of an experiment reproducing and extending the pruning analysis from "Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases" (Hu et al., 2025, arXiv:2502.19249).

The hypothesis under test: pre-pretraining on English itself yields attention-head circuits as sparse/transferable as pre-pretraining on k-shuffle Dyck.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gizemyc/pythia-160m-c4-english-ppt")
tokenizer = AutoTokenizer.from_pretrained("gizemyc/pythia-160m-c4-english-ppt")

Training

  • Base architecture: EleutherAI/pythia-160m (12 layers x 12 heads)
  • Pretraining data: C4 (English)
  • Pre-pretraining (where applicable): k-shuffle Dyck or English/C4
Downloads last month
27
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gizemyc/pythia-160m-c4-english-ppt

Finetuned
(332)
this model

Paper for gizemyc/pythia-160m-c4-english-ppt