kron_300m_4 / README.md
KaiyueWen's picture
Upload folder using huggingface_hub
19f00a4 verified
# Model Card
- Source: [https://arxiv.org/abs/2509.02046](https://arxiv.org/abs/2509.02046)
- Optimizer: `kron`
- Model size: `300m`
- Data size: `24B`
## Best configuration
| Hyperparameter | Value |
|---|---|
| beta1 | `0.95` |
| block_size | `256` |
| learning_rate | `0.0005` |
| max_grad_norm | `1` |
| min_lr_ratio | `0` |
| normalize_grads | `True` |
| partition_grads_into_blocks | `True` |
| preconditioner_init_scale | `1` |
| preconditioner_lr | `0.2` |
| preconditioner_update_probability | `0.1` |
| train_batch_size | `128` |
| update_prob_flat_start | `2000` |
| warmup | `1000` |
| weight_decay | `0.7` |