adamw_1.2b_8 / README.md
KaiyueWen's picture
Upload folder using huggingface_hub
8f2a16b verified
# Model Card
- Source: [https://arxiv.org/abs/2509.02046](https://arxiv.org/abs/2509.02046)
- Optimizer: `adamw`
- Model size: `1.2b`
- Data size: `193B`
## Best configuration
| Hyperparameter | Value |
|---|---|
| beta1 | `0.9` |
| beta2 | `0.98` |
| epsilon | `1e-10` |
| learning_rate | `0.002` |
| max_grad_norm | `1` |
| min_lr_ratio | `0.0` |
| nesterov | `False` |
| train_batch_size | `256` |
| warmup | `1000` |
| weight_decay | `0.2` |