71 178

crumb

jmjzz's profile picture

21world's profile picture

ritik99's profile picture

cephaloform
aicrumb
crumb.bsky.social

AI & ML interests

For what I'm working on right now, check out https://hf.co/crumbs-playground (the mammoth image button on my profile)

Recent Activity

updated a model about 10 hours ago

crumbs-playground/clmr1-qwen3.5-0.8b-warm-start-stage-2

published a model about 10 hours ago

crumbs-playground/clmr1-qwen3.5-0.8b-warm-start-stage-2

liked a model 1 day ago

NX-AI/xLSTM-7b

View all activity

Organizations

crumb 's collections 6

MoLora-v1

Model assets for the first Mixture-of-Lora technique applied to Llama. https://bit.ly/48bqshl

crumb/llama2-7b-moe-text-exp0-4

Updated Jul 19, 2023 • 2
crumb/llama2-7b-moe-text-exp1-4

Updated Jul 19, 2023 • 3 • 2
crumb/llama2-7b-moe-text-exp2-4

Updated Jul 19, 2023
crumb/llama2-7b-moe-text-exp3-4

Updated Jul 19, 2023 • 1

GPT2-Linear

GPT2 Models using Linear layers instead of Conv layers for convenience.

crumbly/gpt2-linear-xl

Text Generation • Updated Jul 18, 2023 • 8 • 1
crumbly/gpt2-linear-large

Text Generation • Updated Jul 17, 2023 • 10
crumbly/gpt2-linear-medium

Text Generation • Updated Jul 17, 2023 • 4
crumbly/gpt2-linear-small

Text Generation • Updated Jul 17, 2023 • 3

Cramp(ed) Models

Smaller models trained locally on my 2xA6000 Lambda Vector

crumbly/cramp-25m

Text Generation • Updated Feb 15, 2024 • 4 • 8
crumb/cramped-94m-8btok

Text Generation • Updated Oct 11, 2023 • 5 • 1

MoLora-v2

First Prototype of the second iteration of MoLora utilizing mixture of expert techniques applied to the Llama2 model.

crumb/test-00-switchllama-i3b-f10b-e4-init

Text Generation • Updated Sep 13, 2023 • 8
crumb/test-00-qlora-wizmlpmix-c0

Updated Sep 4, 2023
crumb/test-00-qlora-wizmlpmix-c1

Updated Sep 4, 2023
crumb/test-00-qlora-wizmlpmix-c3

Updated Sep 4, 2023 • 1

Shrink Llama - V1

Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept.

crumb/core1-base-464m-c4

Text Generation • 0.5B • Updated Sep 12, 2023 • 3
crumb/core1-base-464m-redpajama

Text Generation • Updated Sep 12, 2023

MoAT (More Artificial Tokens)

Allowing for the LM to learn a soft-"multi-step program" to predict future tokens instead of learning to predict future tokens itself.

crumb/16xF-6m-init

Text Generation • Updated Oct 16, 2023 • 12
crumb/32xF-6m-init

Text Generation • Updated Oct 16, 2023 • 15