kshitijthakkar
/

qwen3.5-tiny-test

Image-Text-to-Text

Mixture of Experts

hybrid-attention

Model card Files Files and versions

kshitijthakkar commited on Feb 25

Commit

204387f

·

verified ·

1 Parent(s): 770e930

Update README.md

Files changed (1) hide show

README.md +27 -25

README.md CHANGED Viewed

@@ -1,25 +1,27 @@
----
-tags:
-- qwen3.5
-- moe
-- hybrid-attention
-- deltanet
-- tiny-test
-license: apache-2.0
----
-# Qwen3.5 Tiny Test Model
-A tiny Qwen3.5 hybrid MoE model for testing and validation purposes.
-**This model has random weights and is not trained.** It exists to validate the
-architecture implementation and hub upload pipeline.
-## Architecture
-- **Type**: Hybrid MoE (Gated DeltaNet + Gated Attention)
-- **Parameters**: 138,261,536 total
-- **Layers**: 8 (6 DeltaNet + 2 Full Attention)
-- **Experts**: 8 routed (top-2) + 1 shared
-- **Embedding dim**: 256
-- **Vocab size**: 248,320
-- **Context**: 4096 tokens

+---
+tags:
+- qwen3.5
+- moe
+- hybrid-attention
+- deltanet
+- tiny-test
+- image-text-to-text
+- transformers
+license: apache-2.0
+---
+# Qwen3.5 Tiny Test Model
+A tiny Qwen3.5 hybrid MoE model for testing and validation purposes.
+**This model has random weights and is not trained.** It exists to validate the
+architecture implementation and hub upload pipeline.
+## Architecture
+- **Type**: Hybrid MoE (Gated DeltaNet + Gated Attention)
+- **Parameters**: 138,261,536 total
+- **Layers**: 8 (6 DeltaNet + 2 Full Attention)
+- **Experts**: 8 routed (top-2) + 1 shared
+- **Embedding dim**: 256
+- **Vocab size**: 248,320
+- **Context**: 4096 tokens