Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler Paper • 2508.01483 • Published Aug 2, 2025
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 15
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity? Paper • 2601.23045 • Published 3 days ago
FineWeb-HQ datasets Collection Collection containing FineWeb-HQ and FineWeb2-HQ quality filtered datasets. • 3 items • Updated Oct 8, 2025
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 15
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 15
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1, 2025 • 25
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1, 2025 • 25
Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed Paper • 2406.04443 • Published Jun 6, 2024
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1, 2025 • 25
BaCaDI: Bayesian Causal Discovery with Unknown Interventions Paper • 2206.01665 • Published Jun 3, 2022 • 2
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 77
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 77
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 77