Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge Paper • 2312.05693 • Published Dec 9, 2023 • 1
EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge Paper • 2402.10787 • Published Feb 16, 2024
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model Paper • 2211.11152 • Published Nov 21, 2022
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference Paper • 2411.01171 • Published Nov 2, 2024