ISTA-DASLab/DeepSeek-V3-0324-GPTQ-4b-128g-experts
Text Generation
•
104B
•
Updated
•
88
•
3
None defined yet.
DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation