bluelightai-dev/clt-pretrain-data-v3-eval-tokenized-Qwen3-256
Viewer
•
Updated
•
212k
•
35
bluelightai-dev/clt-pretrain-data-v3-tokenized-Qwen3-max-1024
Viewer
•
Updated
•
4.04M
•
15
bluelightai-dev/clt-pretrain-data-v3-tokenized-qwen3
Viewer
•
Updated
•
1.81M
•
36
bluelightai-dev/clt-pretrain-data-v3
Viewer
•
Updated
•
2.99M
•
24
bluelightai-dev/dolma3_dolmino_mix-100B-1125-sample
Viewer
•
Updated
•
6.32M
•
16
bluelightai-dev/dolma3_mix-150B-1025-sample
Viewer
•
Updated
•
4.97M
•
15
bluelightai-dev/clt-mixed-eval-data-tokenized-Qwen3
Viewer
•
Updated
•
115k
•
2
bluelightai-dev/clt-mixed-eval-data
Viewer
•
Updated
•
60k
•
3
bluelightai-dev/clt-mixed-data-tokenized-Qwen3
Viewer
•
Updated
•
2.6M
•
5
bluelightai-dev/clt-pretrain-eval-data-tokenized-Qwen3-256
Viewer
•
Updated
•
194k
•
27
bluelightai-dev/clt-pretrain-data-dedup-tokenized-Qwen3-1024
Viewer
•
Updated
•
2.52M
•
58
bluelightai-dev/clt-pretrain-data-v2-dedup
Preview
•
Updated
•
6
bluelightai-dev/clt-pretrain-data-tokenized-Qwen3-1024
Viewer
•
Updated
•
2.44M
•
55
bluelightai-dev/clt-pretrain-data-v2
Preview
•
Updated
•
10
bluelightai-dev/MathPile_Commercial-formatted
Viewer
•
Updated
•
389k
•
26
bluelightai-dev/clt_posttrain_data_tokenized
Viewer
•
Updated
•
1.34M
•
24
bluelightai-dev/common-corpus-sample-open-web
Viewer
•
Updated
•
4.8M
•
19
bluelightai-dev/common-corpus-sample-open-source
Viewer
•
Updated
•
2.02M
•
14
bluelightai-dev/common-corpus-sample-open-science
Viewer
•
Updated
•
284k
•
12
bluelightai-dev/common-corpus-sample-open-government
Viewer
•
Updated
•
373k
•
15
•
1
bluelightai-dev/common-corpus-sample-open-culture
Viewer
•
Updated
•
462k
•
22
bluelightai-dev/clt_posttrain_data_tokenized_test_1000
Viewer
•
Updated
•
1.22k
•
3
bluelightai-dev/dclm-full-deduped-sample
Viewer
•
Updated
•
4.92M
•
39
bluelightai-dev/the-stack-dedup-sample
Viewer
•
Updated
•
474k
•
11
bluelightai-dev/pythia_clt_pretrain_data_tokenized
Viewer
•
Updated
•
3.5M
•
38
bluelightai-dev/clt_eval_data_qwen3_tokenized_256
Viewer
•
Updated
•
245k
•
1
bluelightai-dev/clt_pretrain_data_qwen_tokenized
Viewer
•
Updated
•
16.7M
•
73
bluelightai-dev/clt_posttrain_data_qwen_tokenized
Viewer
•
Updated
•
1.34M
bluelightai-dev/clt_pretrain_data
Viewer
•
Updated
•
6.12M
•
198
bluelightai-dev/clt_posttrain_data
Viewer
•
Updated
•
935k
•
37