Update README.md
Browse files
README.md
CHANGED
|
@@ -47,7 +47,7 @@ Monad is strictly monolingual in English. We trained a new custom tokenizer (lik
|
|
| 47 |
## Model design and training
|
| 48 |
Monad is a 56M parameters decoders with a standard Qwen/Llama-like design, except for its extremely compact size and overall opiniated architecture for depth (with 64 layers)
|
| 49 |
<p align="center">
|
| 50 |
-
<img width="80%" src="figures/
|
| 51 |
</p>
|
| 52 |
|
| 53 |
Monad was trained on 16 h100 from Jean Zay (compute plan n°A0191016886). Full pre-training took a bit less than 6 hours.
|
|
|
|
| 47 |
## Model design and training
|
| 48 |
Monad is a 56M parameters decoders with a standard Qwen/Llama-like design, except for its extremely compact size and overall opiniated architecture for depth (with 64 layers)
|
| 49 |
<p align="center">
|
| 50 |
+
<img width="80%" src="figures/monad_structure.png">
|
| 51 |
</p>
|
| 52 |
|
| 53 |
Monad was trained on 16 h100 from Jean Zay (compute plan n°A0191016886). Full pre-training took a bit less than 6 hours.
|