-
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 33 -
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Paper • 2311.08692 • Published • 13 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2412.11231
-
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 33 -
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Paper • 2311.08692 • Published • 13 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19