Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
batmanLovesAI
/
HeliumLM
like
0
Text Generation
PyTorch
roneneldan/TinyStories
English
slm
transformer
attention
optimization
tinystories
educational
arxiv:
2305.07759
arxiv:
2505.19529
License:
mit
Model card
Files
Files and versions
xet
Community
main
HeliumLM
/
checkpoints
962 MB
1 contributor
History:
40 commits
batmanLovesAI
Add: Heliumlm vanilla trained on 50% of the dataset
de77f2e
22 minutes ago
helium-distill-1-08-model-iter-14000.pt
Safe
pickle
Detected Pickle imports (4)
"torch.ComplexFloatStorage"
,
"torch.FloatStorage"
,
"collections.OrderedDict"
,
"torch._utils._rebuild_tensor_v2"
What is a pickle import?
106 MB
xet
Removed uneccessary models and renamed models for better understanding
16 days ago
helium-distill-1-08-model-iter-8000.pt
Safe
pickle
Detected Pickle imports (4)
"collections.OrderedDict"
,
"torch.FloatStorage"
,
"torch.ComplexFloatStorage"
,
"torch._utils._rebuild_tensor_v2"
What is a pickle import?
106 MB
xet
Removed uneccessary models and renamed models for better understanding
16 days ago
helium-distill-5-05-model-iter-8000.pt
Safe
pickle
Detected Pickle imports (4)
"torch._utils._rebuild_tensor_v2"
,
"torch.FloatStorage"
,
"collections.OrderedDict"
,
"torch.ComplexFloatStorage"
What is a pickle import?
106 MB
xet
Removed uneccessary models and renamed models for better understanding
16 days ago
heliumLM-distilled-final-phase-1.pt
Safe
pickle
Detected Pickle imports (4)
"torch.FloatStorage"
,
"torch.ComplexFloatStorage"
,
"torch._utils._rebuild_tensor_v2"
,
"collections.OrderedDict"
What is a pickle import?
106 MB
xet
Added first model of the final phase
13 days ago
heliumlm-grammar-model.pt
Safe
pickle
Detected Pickle imports (4)
"torch.FloatStorage"
,
"torch.ComplexFloatStorage"
,
"torch._utils._rebuild_tensor_v2"
,
"collections.OrderedDict"
What is a pickle import?
106 MB
xet
Deleted irrelevant models and added grammatically correct model trained in phases on entire tinystories dataset (using quartely batch technique)
14 days ago
heliumlm-vanilla-swiglu-2.pt
215 MB
xet
Add: New Vanilla model trained on 25% of data with no ignore_index in loss function
about 14 hours ago
heliumlm-vanilla-swiglu-3.pt
215 MB
xet
Add: Heliumlm vanilla trained on 50% of the dataset
22 minutes ago