Pre-trained models in MiniPLM: Knowledge Distillation for Pre-Training Language Models
-
MiniLLM/MiniPLM-Qwen-200M
Text Generation β’ 0.2B β’ Updated β’ 470 β’ 9 -
MiniLLM/MiniPLM-Qwen-500M
Text Generation β’ 0.5B β’ Updated β’ 63 β’ 7 -
MiniLLM/MiniPLM-Qwen-1.2B
Text Generation β’ 1B β’ Updated β’ 39 β’ 4 -
MiniLLM/MiniPLM-Mamba-130M
Text Generation β’ 0.1B β’ Updated β’ 13 β’ 3