tinyshakespeare-42m

This model is a fine-tuned version of on the tiny_shakespeare dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8737

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 16 6.9032
No log 2.0 32 5.9671
No log 3.0 48 5.7496
6.6567 4.0 64 5.6037
6.6567 5.0 80 5.4368
6.6567 6.0 96 5.2910
5.4744 7.0 112 5.1749
5.4744 8.0 128 5.0940
5.4744 9.0 144 5.0452
5.0133 10.0 160 5.0255
5.0133 11.0 176 5.0034
5.0133 12.0 192 4.9969
4.756 13.0 208 4.9831
4.756 14.0 224 4.9877
4.756 15.0 240 4.9961
4.5668 16.0 256 5.0005
4.5668 17.0 272 5.0135
4.5668 18.0 288 5.0311
4.3775 19.0 304 5.0535
4.3775 20.0 320 5.0667
4.3775 21.0 336 5.0927
4.1854 22.0 352 5.1175
4.1854 23.0 368 5.1487
4.1854 24.0 384 5.1702
3.9762 25.0 400 5.1887
3.9762 26.0 416 5.2351
3.9762 27.0 432 5.2616
3.9762 28.0 448 5.2936
3.793 29.0 464 5.3212
3.793 30.0 480 5.3417
3.793 31.0 496 5.3652
3.639 32.0 512 5.3838
3.639 33.0 528 5.4027
3.639 34.0 544 5.4140
3.5439 35.0 560 5.4247
3.5439 36.0 576 5.4353
3.5439 37.0 592 5.4410
3.4797 38.0 608 5.4424
3.4797 39.0 624 5.4424
3.4797 40.0 640 5.4422

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
9
Safetensors
Model size
30.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MolecularReality/tinyshakespeare-42m

Evaluation results