pythia-160m-c4-english-ppt

Pythia-160M pre-pretrained on English (C4), then pretrained on C4.

Part of an experiment reproducing and extending the pruning analysis from "Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases" (Hu et al., 2025, arXiv:2502.19249).

The hypothesis under test: pre-pretraining on English itself yields attention-head circuits as sparse/transferable as pre-pretraining on k-shuffle Dyck.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gizemyc/pythia-160m-c4-english-ppt")
tokenizer = AutoTokenizer.from_pretrained("gizemyc/pythia-160m-c4-english-ppt")

Training

Base architecture: EleutherAI/pythia-160m (12 layers x 12 heads)
Pretraining data: C4 (English)
Pre-pretraining (where applicable): k-shuffle Dyck or English/C4

Downloads last month: 27

Safetensors

Model size

0.2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gizemyc/pythia-160m-c4-english-ppt

Base model

EleutherAI/pythia-160m

Finetuned

(332)

this model

Paper for gizemyc/pythia-160m-c4-english-ppt

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Paper • 2502.19249 • Published May 27, 2025