-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 36 -
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Paper • 2402.16822 • Published • 17 -
FuseChat: Knowledge Fusion of Chat Models
Paper • 2402.16107 • Published • 39 -
Multi-LoRA Composition for Image Generation
Paper • 2402.16843 • Published • 31
Collections
Discover the best community collections!
Collections including paper arxiv:2408.07055
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 29 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 83 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 17
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 87 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 40 -
FreeU: Free Lunch in Diffusion U-Net
Paper • 2309.11497 • Published • 66 -
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Paper • 2309.11674 • Published • 33
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 40 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 82 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 87 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 85
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 36 -
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Paper • 2402.16822 • Published • 17 -
FuseChat: Knowledge Fusion of Chat Models
Paper • 2402.16107 • Published • 39 -
Multi-LoRA Composition for Image Generation
Paper • 2402.16843 • Published • 31
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 29 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 83 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 17
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 40 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 82 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 87 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 85
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 87 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 40 -
FreeU: Free Lunch in Diffusion U-Net
Paper • 2309.11497 • Published • 66 -
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Paper • 2309.11674 • Published • 33