Knowledge Distillation
updated
shayekh/aya8b-distillkit-hidden
shayekh/aya8b-distillkit-logits
Updated
0.6B • Updated
• 1
Less is More: Task-aware Layer-wise Distillation for Language Model
Compression
Paper
• 2210.01351
• Published
• 3
A Survey on Knowledge Distillation of Large Language Models
Paper
• 2402.13116
• Published
• 4
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
Labelling
Paper
• 2311.00430
• Published
• 56
On-Policy Distillation of Language Models: Learning from Self-Generated
Mistakes
Paper
• 2306.13649
• Published
• 31
Compact Language Models via Pruning and Knowledge Distillation
Paper
• 2407.14679
• Published
• 39
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
• 2408.11796
• Published
• 58
DistiLLM: Towards Streamlined Distillation for Large Language Models
Paper
• 2402.03898
• Published
• 3
Relational Knowledge Distillation
Paper
• 1904.05068
• Published
• 1
Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes
Paper
• 2305.02301
• Published
• 5