YAYI 2: Multilingual Open-Source Large Language Models
Paper
• 2312.14862
• Published • 14
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published • 61
TrustLLM: Trustworthiness in Large Language Models
Paper
• 2401.05561
• Published • 69
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published • 61
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper
• 2401.02415
• Published • 54
Composable Function-preserving Expansions for Transformer Architectures
Paper
• 2308.06103
• Published • 21
Thinking Like Transformers
Paper
• 2106.06981
• Published • 1
Large Language Models are Superpositions of All Characters: Attaining
Arbitrary Role-play via Self-Alignment
Paper
• 2401.12474
• Published • 36
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published • 107
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published • 47
BlackMamba: Mixture of Experts for State-Space Models
Paper
• 2402.01771
• Published • 25
Code Representation Learning At Scale
Paper
• 2402.01935
• Published • 13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published • 144
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
• 2402.01739
• Published • 28
Rethinking Optimization and Architecture for Tiny Language Models
Paper
• 2402.02791
• Published • 13
Scaling Laws for Fine-Grained Mixture of Experts
Paper
• 2402.07871
• Published • 13
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper
• 2402.07043
• Published • 15
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
• 2402.07827
• Published • 48
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language
Models
Paper
• 2402.10986
• Published • 82
TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Paper
• 2310.10944
• Published • 10
DenseMamba: State Space Models with Dense Hidden Connection for
Efficient Large Language Models
Paper
• 2403.00818
• Published • 19
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published • 94
Pre-training Small Base LMs with Fewer Tokens
Paper
• 2404.08634
• Published • 36
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to
the Edge of Generalization
Paper
• 2405.15071
• Published • 42
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language
Models
Paper
• 2407.12327
• Published • 79
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper
• 2411.04965
• Published • 69