kshitijthakkar commited on
Commit
204387f
·
verified ·
1 Parent(s): 770e930

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -25
README.md CHANGED
@@ -1,25 +1,27 @@
1
- ---
2
- tags:
3
- - qwen3.5
4
- - moe
5
- - hybrid-attention
6
- - deltanet
7
- - tiny-test
8
- license: apache-2.0
9
- ---
10
-
11
- # Qwen3.5 Tiny Test Model
12
-
13
- A tiny Qwen3.5 hybrid MoE model for testing and validation purposes.
14
-
15
- **This model has random weights and is not trained.** It exists to validate the
16
- architecture implementation and hub upload pipeline.
17
-
18
- ## Architecture
19
- - **Type**: Hybrid MoE (Gated DeltaNet + Gated Attention)
20
- - **Parameters**: 138,261,536 total
21
- - **Layers**: 8 (6 DeltaNet + 2 Full Attention)
22
- - **Experts**: 8 routed (top-2) + 1 shared
23
- - **Embedding dim**: 256
24
- - **Vocab size**: 248,320
25
- - **Context**: 4096 tokens
 
 
 
1
+ ---
2
+ tags:
3
+ - qwen3.5
4
+ - moe
5
+ - hybrid-attention
6
+ - deltanet
7
+ - tiny-test
8
+ - image-text-to-text
9
+ - transformers
10
+ license: apache-2.0
11
+ ---
12
+
13
+ # Qwen3.5 Tiny Test Model
14
+
15
+ A tiny Qwen3.5 hybrid MoE model for testing and validation purposes.
16
+
17
+ **This model has random weights and is not trained.** It exists to validate the
18
+ architecture implementation and hub upload pipeline.
19
+
20
+ ## Architecture
21
+ - **Type**: Hybrid MoE (Gated DeltaNet + Gated Attention)
22
+ - **Parameters**: 138,261,536 total
23
+ - **Layers**: 8 (6 DeltaNet + 2 Full Attention)
24
+ - **Experts**: 8 routed (top-2) + 1 shared
25
+ - **Embedding dim**: 256
26
+ - **Vocab size**: 248,320
27
+ - **Context**: 4096 tokens