Model assets for the first Mixture-of-Lora technique applied to Llama. https://bit.ly/48bqshl
crumb
crumb
AI & ML interests
For what I'm working on right now, check out https://hf.co/crumbs-playground (the mammoth image button on my profile)
Recent Activity
updated a model about 10 hours ago
crumbs-playground/clmr1-qwen3.5-0.8b-warm-start-stage-2 published a model about 10 hours ago
crumbs-playground/clmr1-qwen3.5-0.8b-warm-start-stage-2 liked a model 1 day ago
NX-AI/xLSTM-7bOrganizations
GPT2-Linear
GPT2 Models using Linear layers instead of Conv layers for convenience.
Cramp(ed) Models
Smaller models trained locally on my 2xA6000 Lambda Vector
MoLora-v2
First Prototype of the second iteration of MoLora utilizing mixture of expert techniques applied to the Llama2 model.
Shrink Llama - V1
Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept.
MoAT (More Artificial Tokens)
Allowing for the LM to learn a soft-"multi-step program" to predict future tokens instead of learning to predict future tokens itself.
MoLora-v1
Model assets for the first Mixture-of-Lora technique applied to Llama. https://bit.ly/48bqshl
MoLora-v2
First Prototype of the second iteration of MoLora utilizing mixture of expert techniques applied to the Llama2 model.
GPT2-Linear
GPT2 Models using Linear layers instead of Conv layers for convenience.
Shrink Llama - V1
Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept.
Cramp(ed) Models
Smaller models trained locally on my 2xA6000 Lambda Vector
MoAT (More Artificial Tokens)
Allowing for the LM to learn a soft-"multi-step program" to predict future tokens instead of learning to predict future tokens itself.