Foundry-LLM-1.2B-800B

A 1.2B parameter language model pretrained on 800B tokens, part of the VLA Foundry model collection.

Model Description

  • Architecture: Transformer (24 layers, 2048 hidden dim, 16 heads, SwiGLU FFN, RoPE, QK-norm)
  • Parameters: 1.2B (non-embedding)
  • Tokenizer: SmolVLM2 (vocab size 49,280)
  • Training data: 800B tokens from DCLM-Baseline-1.0
  • LR schedule: Warmup + constant (no decay)
  • Sequence length: 2048

Earlier checkpoint of the Foundry LLM, used as the language backbone for the downstream VLM and VLA models.

Evaluation Results

Multiple-choice reasoning benchmarks:

HellaSwag MMLU ARC-e ARC-c PIQA WinoGrande OpenBookQA BoolQ
64.3 26.0 70.3 37.0 75.8 60.9 40.0 63.2

Usage

git clone https://github.com/TRI-ML/vla_foundry.git
cd vla_foundry
pip install -e .
from vla_foundry.models.base_model import BaseModel
model = BaseModel.from_pretrained("TRI-ML/Foundry-LLM-1.2B-800B")

Links

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including TRI-ML/Foundry-LLM-1.2B-800B

Paper for TRI-ML/Foundry-LLM-1.2B-800B