Transformers
English
abertsch commited on
Commit
38e0eeb
·
verified ·
1 Parent(s): 2f1f9ba

Add README

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ ---
7
+
8
+ # Model Summary
9
+
10
+ This is one of the models from the OlmPool set of architectural variations. The final checkpoint for each model is a 7-8B model that has been trained to 150B tokens (140B in pretraining and 10B in context extension). Note that these models are *early in pretraining* with little-to-no instruction-format data, and thus are very poor at most tasks.
11
+
12
+ For more information about OlmPool, see the **paper**: http://allenai.org/papers/olmpool.
13
+ # Use
14
+
15
+ You **must specify a revision** and set `use_remote_code=True` to load OlmPool models. The revision is the checkpoint that you would like to load. For instance, to load the final post-context-extension model:
16
+ ```python
17
+ from transformers import AutoModel
18
+ import torch
19
+
20
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
21
+
22
+ model = AutoModel.from_pretrained("allenai/B_post_LQK_32kv_4k_11k_SWA", revision="longcontext-step2385", use_remote_code=True).to(DEVICE)
23
+ ```
24
+
25
+ You can list all revisions/branches by installing `huggingface-hub` & running:
26
+ ```python
27
+ from huggingface_hub import list_repo_refs
28
+ out = list_repo_refs("allenai/B_post_LQK_32kv_4k_11k_SWA")
29
+ branches = [b.name for b in out.branches]
30
+ ```
31
+
32
+ Important branches:
33
+ - `step34000`: Final pretraining checkpoint
34
+ - `longcontext-step2385`: Final long context checkpoint
35
+
36
+ # Citation
37
+
38
+ ```bibtex
39
+ @misc{bertsch2026cracks,
40
+ title={Cracks in the Foundation: Seemingly Minor Architectural Choices Impact Long Context Extension},
41
+ author={Amanda Bertsch and Luca Soldaini and Matthew R. Gormley and Graham Neubig and Hanna Hajishirzi and Kyle Lo and Dirk Groeneveld},
42
+ year={2026},
43
+ }
44
+ ```