VerySmollGPT

A lightweight character-level GPT model trained entirely on a Raspberry Pi 5. This model demonstrates that capable language models can be trained on consumer hardware with limited resources.

Model Description

VerySmollGPT is a decoder-only transformer model (GPT-style architecture) designed for character-level text generation. It was trained on the TinyStories dataset to generate coherent short stories.

Developed by: Kittykat924
Model type: Decoder-only Transformer (GPT)
Language: English
License: MIT
Trained on: Raspberry Pi 5 (CPU only)
Training duration: ~9 days
Parameters: 4.80M (unique), 4.83M (with weight tying)

Model Architecture

Component	Value
Vocabulary Size	104 characters
Embedding Dimension	256
Layers	6
Attention Heads	8
Feed-forward Dimension	1024
Context Window	128 tokens
Dropout	0.1
Weight Tying	Yes (token embeddings ↔ output layer)

Training Details

Training Data

Dataset: TinyStories
Dataset Size: ~25MB (optimized for Raspberry Pi)
Total Tokens: ~25M characters
Train/Val Split: 90/10

Training Procedure

Hardware:

Raspberry Pi 5
CPU-only training (no GPU)
Training time: ~9 days

Hyperparameters:

Epochs: 3
Batch Size: 16
Learning Rate: 3e-4 (initial)
Min Learning Rate: 1e-4 (cosine annealing)
Optimizer: AdamW (β₁=0.9, β₂=0.95)
Weight Decay: 0.01
Gradient Clipping: 1.0
Max Batches per Epoch: 130,000
Context Window: 128 tokens

Training Stats:

Final Epoch: 2 (checkpoint from epoch 3)
Global Steps: 390,000
Best Validation Loss: 0.692

Tokenization

Character-level tokenization with 104 unique tokens:

100 regular characters (letters, numbers, punctuation, special characters)
4 special tokens: <PAD>, <UNK>, <BOS>, <EOS>

Usage

Installation

pip install torch safetensors

Loading the Model

from safetensors.torch import load_file
import torch
import torch.nn as nn

# Load model weights
state_dict = load_file('model.safetensors')

# Load configuration
import json
with open('config.json', 'r') as f:
    config = json.load(f)

# Note: You'll need to implement the VerySmollGPT architecture
# or use the original model.py from the repository

Text Generation Example

# Assuming you have the model loaded
model.eval()

# Encode your prompt (character-level)
prompt = "Once upon a time"
input_ids = [char_to_idx[c] for c in prompt]
input_tensor = torch.tensor([input_ids], dtype=torch.long)

# Generate
with torch.no_grad():
    output_ids = model.generate(
        input_tensor,
        max_new_tokens=200,
        temperature=0.8,
        top_k=40
    )

# Decode output
generated_text = ''.join([idx_to_char[i] for i in output_ids[0].tolist()])
print(generated_text)

Example Outputs

Prompt: "Once upon a time"

Generated:

Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite was a penguin that had a shiny metal box on it. Timmy liked to...

Prompt: "The quick brown fox"

Generated:

The quick brown fox wanted to play with him again. The fox said he was not fair anymore. He said he was sorry and that he learned his lesson...

Limitations and Bias

Character-level tokenization: Less efficient than BPE/WordPiece for longer texts
Small context window: 128 tokens limits long-range dependencies
Training data: Limited to TinyStories dataset style (simple children's stories)
Vocabulary: Only 104 characters, may not handle all Unicode characters
Coherence: Best for short-form text generation (stories, snippets)

Environmental Impact

This model was intentionally trained on a Raspberry Pi 5 to demonstrate low-power AI training:

Hardware: Raspberry Pi 5 (CPU only, ~15W power consumption)
Training Duration: ~9 days
Estimated Energy: ~3.24 kWh total
Carbon Footprint: Minimal compared to GPU-based training

Technical Specifications

Model Size: 19 MB (safetensors format)
Inference Memory: ~200-300 MB RAM
Training Memory: ~1-2 GB RAM (batch_size=16)
Precision: FP32

Acknowledgments

Architecture inspired by Andrej Karpathy's nanoGPT
Dataset: TinyStories by Ronen Eldan and Yuanzhi Li
Trained on Raspberry Pi 5 to demonstrate accessible AI training

Github

Downloads last month: 24

Dataset used to train Kittykat924/VerySmollGPT-5M-Base

Evaluation results

Training Loss (Final) on TinyStories
self-reported

0.678
Validation Loss (Final) on TinyStories
self-reported

0.703
Validation Loss (Best) on TinyStories
self-reported

0.692