RAE Collection Collection for Diffusion Transformers with Representation Autoencoders β’ 7 items β’ Updated 7 days ago β’ 11
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 β’ 120
Pre-training Dataset Samples Collection A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. β’ 19 items β’ Updated Dec 25, 2025 β’ 18
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 β’ 30
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16, 2025 β’ 7
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16, 2025 β’ 7 β’ 2
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16, 2025 β’ 7
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. β’ 11 items β’ Updated 6 days ago β’ 99