Instructions to use allenai/B_post_LQK_32kv_4k_11k_SWA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allenai/B_post_LQK_32kv_4k_11k_SWA with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("allenai/B_post_LQK_32kv_4k_11k_SWA", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Add README
Browse files
README.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
library_name: transformers
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# Model Summary
|
| 9 |
+
|
| 10 |
+
This is one of the models from the OlmPool set of architectural variations. The final checkpoint for each model is a 7-8B model that has been trained to 150B tokens (140B in pretraining and 10B in context extension). Note that these models are *early in pretraining* with little-to-no instruction-format data, and thus are very poor at most tasks.
|
| 11 |
+
|
| 12 |
+
For more information about OlmPool, see the **paper**: http://allenai.org/papers/olmpool.
|
| 13 |
+
# Use
|
| 14 |
+
|
| 15 |
+
You **must specify a revision** and set `use_remote_code=True` to load OlmPool models. The revision is the checkpoint that you would like to load. For instance, to load the final post-context-extension model:
|
| 16 |
+
```python
|
| 17 |
+
from transformers import AutoModel
|
| 18 |
+
import torch
|
| 19 |
+
|
| 20 |
+
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
| 21 |
+
|
| 22 |
+
model = AutoModel.from_pretrained("allenai/B_post_LQK_32kv_4k_11k_SWA", revision="longcontext-step2385", use_remote_code=True).to(DEVICE)
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
You can list all revisions/branches by installing `huggingface-hub` & running:
|
| 26 |
+
```python
|
| 27 |
+
from huggingface_hub import list_repo_refs
|
| 28 |
+
out = list_repo_refs("allenai/B_post_LQK_32kv_4k_11k_SWA")
|
| 29 |
+
branches = [b.name for b in out.branches]
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
Important branches:
|
| 33 |
+
- `step34000`: Final pretraining checkpoint
|
| 34 |
+
- `longcontext-step2385`: Final long context checkpoint
|
| 35 |
+
|
| 36 |
+
# Citation
|
| 37 |
+
|
| 38 |
+
```bibtex
|
| 39 |
+
@misc{bertsch2026cracks,
|
| 40 |
+
title={Cracks in the Foundation: Seemingly Minor Architectural Choices Impact Long Context Extension},
|
| 41 |
+
author={Amanda Bertsch and Luca Soldaini and Matthew R. Gormley and Graham Neubig and Hanna Hajishirzi and Kyle Lo and Dirk Groeneveld},
|
| 42 |
+
year={2026},
|
| 43 |
+
}
|
| 44 |
+
```
|