Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
kshitijthakkar 's Collections
mcp-server-bench
Qwen3.5 Dense-to-MoE Weight Transfer
Large MoE Architecture Search (1B-2B)
Mobile MoE Architecture Search
OutageOdyssey
TraceMind-AI
Loggenix-MOE

Qwen3.5 Dense-to-MoE Weight Transfer

updated Mar 4

Qwen3.5 MoE models from dual-source weight transfer (dense backbone + 35B-A3B experts). Hybrid DeltaNet + GQA attention.

Upvote
-

  • kshitijthakkar/qwen3.5-moe-0.87B-d0.8B

    Image-Text-to-Text • 1B • Updated Mar 4 • 31

  • kshitijthakkar/qwen3.5-moe-2.3B-d2B

    Image-Text-to-Text • 3B • Updated Mar 4 • 4

  • kshitijthakkar/qwen3.5-moe-4.7B-d4B

    Image-Text-to-Text • 5B • Updated Mar 4 • 29

  • kshitijthakkar/qwen3.5-tiny-test

    Image-Text-to-Text • 0.1B • Updated Feb 25 • 4

  • kshitijthakkar/qwen3.5-from-scratch-tiny

    2B • Updated Mar 4 • 5

  • kshitijthakkar/qwen3.5-0.8b-moe-from-scratch

    3B • Updated Mar 4 • 2
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs