Pretraining Data
updated
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer
•
Updated
•
958M
•
55.8k
•
63
Viewer
•
Updated
•
56.2M
•
140k
•
30
Viewer
•
Updated
•
3.8B
•
13.9k
•
106
allenai/dolma3_dolmino_pool
Updated
•
87.3k
•
7
allenai/dolma3_longmino_pool
Updated
•
48.6k
•
10
Viewer
•
Updated
•
476M
•
32.1k
•
817
Viewer
•
Updated
•
4.48B
•
68.3k
•
757
Viewer
•
Updated
•
61.6M
•
6.29k
•
284
Viewer
•
Updated
•
819M
•
53.7k
•
11
tokyotech-llm/swallow-code-v2
Viewer
•
Updated
•
147M
•
174k
•
31
ByteDance-Seed/Code-Contests-Plus
Viewer
•
Updated
•
49.2k
•
25.9k
•
60
Viewer
•
Updated
•
7.09M
•
4.85k
•
158
nvidia/Nemotron-Pretraining-Code-v2
Viewer
•
Updated
•
836M
•
3.13k
•
103
nvidia/Nemotron-Pretraining-Specialized-v1
Viewer
•
Updated
•
60.7M
•
3.79k
•
70
nvidia/Nemotron-CC-Math-v1
Viewer
•
Updated
•
190M
•
3.52k
•
67
nvidia/Nemotron-Pretraining-SFT-v1
Viewer
•
Updated
•
299M
•
2.8k
•
62
Viewer
•
Updated
•
1.86M
•
17.6k
•
225
EssentialAI/essential-web-v1.0
Preview
•
Updated
•
111k
•
218
EssentialAI/eai-taxonomy-stem-w-dclm
Preview
•
Updated
•
134
•
6
EssentialAI/eai-taxonomy-med-w-dclm
Viewer
•
Updated
•
81.2M
•
131
•
8
EssentialAI/eai-taxonomy-code-w-dclm
Viewer
•
Updated
•
274M
•
85k
•
9
EssentialAI/eai-taxonomy-math-w-fm
Viewer
•
Updated
•
21.6M
•
190
•
5
Viewer
•
Updated
•
27.9B
•
29
•
3
DataMuncher-Labs/UltiMath
Viewer
•
Updated
•
32.9B
•
17.8k
•
42
HuggingFaceFW/finetranslations
Viewer
•
Updated
•
3.33B
•
42.1k
•
270
Viewer
•
Updated
•
69.9k
•
61.2k
•
364