See axolotl config
axolotl version: 0.13.0.dev0
base_model: /home/alex/Workspace/sllama/out_5/checkpoint-1722000
trust_remote_code: true
resize_token_embeddings_to_32x: true
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
unfrozen_parameters:
- ^(?![\s\S]*embed_tokens)[\s\S]+$
datasets:
- path: fw_merged
type: input_output
shards: 8
shards_idx: 0
dataloader_num_workers: 0
group_by_length: false
dataset_prepared_path: data_prep
output_dir: ./out_6
dataloader_pin_memory: true
shuffle_merged_datasets: false
sequence_len: 2048
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: true
use_tensorboard: true
use_wandb: true
wandb_project: sllama
gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
#max_steps: 1_000_000
save_steps: 500
save_total_limit: 2
save_only_model: true
optimizer: sgd
optim_args:
momentum: 0.98
lr_scheduler: constant
learning_rate: 0.1
#embedding_lr: 5e-7
#cosine_constant_lr_ratio: 0.1
max_grad_norm: 1.0
bf16: auto
fp8: false
gradient_checkpointing: false
gradient_checkpointing_kwargs:
use_reentrant: false
logging_steps: 10
torch_compile: false
torch_compile_backend: inductor
torch_compile_mode: default
flash_attention: true
#warmup_ratio: 0.05
weight_decay: 0.01
out_6
This model was trained from scratch on the None dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.1
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Use OptimizerNames.SGD and the args are: momentum=0.98
- lr_scheduler_type: constant
- lr_scheduler_warmup_steps: 100
- training_steps: 133451
Training results
Framework versions
- Transformers 4.56.1
- Pytorch 2.7.1+cu128
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 10