Drop-Upcycling
updated
9B • Updated • 1
9B • Updated • 4
19B • Updated • 1
9B • Updated • 6
9B • Updated • 5
0.4B • Updated • 6
0.4B • Updated • 3
0.4B • Updated • 7
0.4B • Updated • 2
9B • Updated • 2
19B • Updated • 4
0.4B • Updated • 3
9B • Updated • 5
0.4B • Updated • 5
• 1
2B • Updated • 2
0.2B • Updated • 5
4B • Updated • 4
14B • Updated • 5
llm-jp/Dense-btx-code-expert-152M
0.2B • Updated • 5
• 1
llm-jp/Dense-btx-english-expert-1.5B
2B • Updated • 2
llm-jp/Dense-btx-code-expert-1.5B
2B • Updated • 6
• 1
llm-jp/Dense-btx-japanese-expert-1.5B
2B • Updated • 3
• 1
llm-jp/Dense-btx-english-expert-152M
0.2B • Updated • 2
llm-jp/Dense-btx-japanese-expert-152M
0.2B • Updated • 3
Drop-Upcycling: Training Sparse Mixture of Experts with Partial
Re-initialization
Paper
• 2502.19261
• Published • 6
Text Generation
• 73B • Updated • 4
llm-jp/llm-jp-3-8x13b-instruct3
Text Generation
• 73B • Updated • 65
• 8
llm-jp/llm-jp-3-8x1.8b-instruct3
Text Generation
• 9B • Updated • 67
• 4
Text Generation
• 9B • Updated • 62
llm-jp/llm-jp-3-8x13b-instruct2
Text Generation
• 73B • Updated • 47
llm-jp/llm-jp-3-8x1.8b-instruct2
Text Generation
• 9B • Updated • 12
llm-jp/llm-jp-3.1-8x13b-instruct4
Text Generation
• 73B • Updated • 206
• 4
Text Generation
• 73B • Updated • 69