VerySmollGPT

A lightweight character-level GPT model trained entirely on a Raspberry Pi 5. This model demonstrates that capable language models can be trained on consumer hardware with limited resources.

Model Description

VerySmollGPT is a decoder-only transformer model (GPT-style architecture) designed for character-level text generation. It was trained on the TinyStories dataset to generate coherent short stories.

  • Developed by: Kittykat924
  • Model type: Decoder-only Transformer (GPT)
  • Language: English
  • License: MIT
  • Trained on: Raspberry Pi 5 (CPU only)
  • Training duration: ~9 days
  • Parameters: 4.80M (unique), 4.83M (with weight tying)

Model Architecture

Component Value
Vocabulary Size 104 characters
Embedding Dimension 256
Layers 6
Attention Heads 8
Feed-forward Dimension 1024
Context Window 128 tokens
Dropout 0.1
Weight Tying Yes (token embeddings ↔ output layer)

Training Details

Training Data

  • Dataset: TinyStories
  • Dataset Size: ~25MB (optimized for Raspberry Pi)
  • Total Tokens: ~25M characters
  • Train/Val Split: 90/10

Training Procedure

Hardware:

  • Raspberry Pi 5
  • CPU-only training (no GPU)
  • Training time: ~9 days

Hyperparameters:

  • Epochs: 3
  • Batch Size: 16
  • Learning Rate: 3e-4 (initial)
  • Min Learning Rate: 1e-4 (cosine annealing)
  • Optimizer: AdamW (β₁=0.9, β₂=0.95)
  • Weight Decay: 0.01
  • Gradient Clipping: 1.0
  • Max Batches per Epoch: 130,000
  • Context Window: 128 tokens

Training Stats:

  • Final Epoch: 2 (checkpoint from epoch 3)
  • Global Steps: 390,000
  • Best Validation Loss: 0.692

Tokenization

Character-level tokenization with 104 unique tokens:

  • 100 regular characters (letters, numbers, punctuation, special characters)
  • 4 special tokens: <PAD>, <UNK>, <BOS>, <EOS>

Usage

Installation

pip install torch safetensors

Loading the Model

from safetensors.torch import load_file
import torch
import torch.nn as nn

# Load model weights
state_dict = load_file('model.safetensors')

# Load configuration
import json
with open('config.json', 'r') as f:
    config = json.load(f)

# Note: You'll need to implement the VerySmollGPT architecture
# or use the original model.py from the repository

Text Generation Example

# Assuming you have the model loaded
model.eval()

# Encode your prompt (character-level)
prompt = "Once upon a time"
input_ids = [char_to_idx[c] for c in prompt]
input_tensor = torch.tensor([input_ids], dtype=torch.long)

# Generate
with torch.no_grad():
    output_ids = model.generate(
        input_tensor,
        max_new_tokens=200,
        temperature=0.8,
        top_k=40
    )

# Decode output
generated_text = ''.join([idx_to_char[i] for i in output_ids[0].tolist()])
print(generated_text)

Example Outputs

Prompt: "Once upon a time"

Generated:

Once upon a time, there was a little girl named Lily. She loved to play with her toys and her favorite was a penguin that had a shiny metal box on it. Timmy liked to...

Prompt: "The quick brown fox"

Generated:

The quick brown fox wanted to play with him again. The fox said he was not fair anymore. He said he was sorry and that he learned his lesson...

Limitations and Bias

  • Character-level tokenization: Less efficient than BPE/WordPiece for longer texts
  • Small context window: 128 tokens limits long-range dependencies
  • Training data: Limited to TinyStories dataset style (simple children's stories)
  • Vocabulary: Only 104 characters, may not handle all Unicode characters
  • Coherence: Best for short-form text generation (stories, snippets)

Environmental Impact

This model was intentionally trained on a Raspberry Pi 5 to demonstrate low-power AI training:

  • Hardware: Raspberry Pi 5 (CPU only, ~15W power consumption)
  • Training Duration: ~9 days
  • Estimated Energy: ~3.24 kWh total
  • Carbon Footprint: Minimal compared to GPU-based training

Technical Specifications

  • Model Size: 19 MB (safetensors format)
  • Inference Memory: ~200-300 MB RAM
  • Training Memory: ~1-2 GB RAM (batch_size=16)
  • Precision: FP32

Acknowledgments

Github

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Kittykat924/VerySmollGPT-5M-Base

Evaluation results