Collections
Discover the best community collections!
Collections including paper arxiv:2401.02415
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 42 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 38
-
Mixtral of Experts
Paper • 2401.04088 • Published • 160 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 95 -
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 53
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27
-
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 53 -
Datasheets for Datasets
Paper • 1803.09010 • Published • 3 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper • 2310.17752 • Published • 15
-
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 53 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 54
-
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 53 -
Datasheets for Datasets
Paper • 1803.09010 • Published • 3 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper • 2310.17752 • Published • 15
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 42 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 38
-
Mixtral of Experts
Paper • 2401.04088 • Published • 160 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 95 -
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 53
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27
-
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 53 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 22 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 54