scion_130m_2 / README.md
KaiyueWen's picture
Upload folder using huggingface_hub
04d2cdb verified

Model Card

Best configuration

Hyperparameter Value
beta1 0.95
decay 1
learning_rate 0.016
lr_schedule linear
max_grad_norm 1
min_lr_ratio 0
momentum 0.9
scion_epsilon 1e-15
scion_to_signum_lr 0.2
train_batch_size 128
warmup 0
weight_decay 0.1