Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5, 2025 • 26
meituan-longcat/LongCat-Flash-Thinking-ZigZag Text Generation • 562B • Updated about 1 month ago • 32 • 31
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Paper • 2601.07372 • Published Jan 12 • 43