VerySmollGPT
A lightweight character-level GPT model trained entirely on a Raspberry Pi 5. This model demonstrates that capable language models can be trained on consumer hardware with limited resources.
Model Description
VerySmollGPT is a decoder-only transformer model (GPT-style architecture) designed for character-level text generation. It was trained on the TinyStories dataset to generate coherent short stories.
- Developed by: Kittykat924
- Model type: Decoder-only Transformer (GPT)
- Language: English
- License: MIT
- Trained on: Raspberry Pi 5 (CPU only)
- Training duration: ~9 days
- Parameters: 4.80M (unique), 4.83M (with weight tying)
Model Architecture
| Component | Value |
|---|---|
| Vocabulary Size | 104 characters |
| Embedding Dimension | 256 |
| Layers | 6 |
| Attention Heads | 8 |
| Feed-forward Dimension | 1024 |
| Context Window | 128 tokens |
| Dropout | 0.1 |
| Weight Tying | Yes (token embeddings ↔ output layer) |
Training Details
Training Data
- Dataset: TinyStories
- Dataset Size: ~25MB (optimized for Raspberry Pi)
- Total Tokens: ~25M characters
- Train/Val Split: 90/10
Training Procedure
Hardware:
- Raspberry Pi 5
- CPU-only training (no GPU)
- Training time: ~9 days
Hyperparameters:
- Epochs: 3
- Batch Size: 16
- Learning Rate: 3e-4 (initial)
- Min Learning Rate: 1e-4 (cosine annealing)
- Optimizer: AdamW (β₁=0.9, β₂=0.95)
- Weight Decay: 0.01
- Gradient Clipping: 1.0
- Max Batches per Epoch: 130,000
- Context Window: 128 tokens
Training Stats:
- Final Epoch: 2 (checkpoint from epoch 3)
- Global Steps: 390,000
- Best Validation Loss: 0.692
Tokenization
Character-level tokenization with 104 unique tokens:
- 100 regular characters (letters, numbers, punctuation, special characters)
- 4 special tokens:
<PAD>,<UNK>,<BOS>,<EOS>
Usage
Installation
pip install torch safetensors
Loading the Model
from safetensors.torch import load_file
import torch
import torch.nn as nn
# Load model weights
state_dict = load_file('model.safetensors')
# Load configuration
import json
with open('config.json', 'r') as f:
config = json.load(f)
# Note: You'll need to implement the VerySmollGPT architecture
# or use the original model.py from the repository
Text Generation Example
# Assuming you have the model loaded
model.eval()
# Encode your prompt (character-level)
prompt = "Once upon a time"
input_ids = [char_to_idx[c] for c in prompt]
input_tensor = torch.tensor([input_ids], dtype=torch.long)
# Generate
with torch.no_grad():
output_ids = model.generate(
input_tensor,
max_new_tokens=200,
temperature=0.8,
top_k=40
)
# Decode output
generated_text = ''.join([idx_to_char[i] for i in output_ids[0].tolist()])
print(generated_text)
Example Outputs
Prompt: "Once upon a time"
Generated:
Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite was a penguin that had a shiny metal box on it. Timmy liked to...
Prompt: "The quick brown fox"
Generated:
The quick brown fox wanted to play with him again. The fox said he was not fair anymore. He said he was sorry and that he learned his lesson...
Limitations and Bias
- Character-level tokenization: Less efficient than BPE/WordPiece for longer texts
- Small context window: 128 tokens limits long-range dependencies
- Training data: Limited to TinyStories dataset style (simple children's stories)
- Vocabulary: Only 104 characters, may not handle all Unicode characters
- Coherence: Best for short-form text generation (stories, snippets)
Environmental Impact
This model was intentionally trained on a Raspberry Pi 5 to demonstrate low-power AI training:
- Hardware: Raspberry Pi 5 (CPU only, ~15W power consumption)
- Training Duration: ~9 days
- Estimated Energy: ~3.24 kWh total
- Carbon Footprint: Minimal compared to GPU-based training
Technical Specifications
- Model Size: 19 MB (safetensors format)
- Inference Memory: ~200-300 MB RAM
- Training Memory: ~1-2 GB RAM (batch_size=16)
- Precision: FP32
Acknowledgments
- Architecture inspired by Andrej Karpathy's nanoGPT
- Dataset: TinyStories by Ronen Eldan and Yuanzhi Li
- Trained on Raspberry Pi 5 to demonstrate accessible AI training
- Downloads last month
- 24
Dataset used to train Kittykat924/VerySmollGPT-5M-Base
Evaluation results
- Training Loss (Final) on TinyStoriesself-reported0.678
- Validation Loss (Final) on TinyStoriesself-reported0.703
- Validation Loss (Best) on TinyStoriesself-reported0.692