YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Qwen3 8M Model with Falcon-H1-0.5B-Instruct Tokenizer

Model Description

This is an 8M parameter Qwen3 model architecture combined with the Falcon-H1-0.5B-Instruct tokenizer (32K vocabulary).

  • Architecture: Qwen3 (Grouped Query Attention, RMS Normalization, Q/K Normalization, RoPE)
  • Tokenizer: Falcon-H1-0.5B-Instruct (32K vocab)
  • Parameters: 2,183,552
  • Precision: BF16
  • Format: SafeTensors
  • Vocabulary Size: 32768

Configuration

  • vocab_size: 32768
  • hidden_size: 64
  • num_attention_heads: 4
  • num_key_value_heads: 2
  • num_hidden_layers: 2
  • intermediate_size: 160
  • head_dim: 16
  • max_position_embeddings: 4096

Special Tokens

  • BOS: <|begin_of_text|> (id: 17)
  • EOS: <|end_of_text|> (id: 11)
  • PAD: <|pad|> (id: 0)

Usage

from transformers import Qwen3ForCausalLM, AutoTokenizer

model = Qwen3ForCausalLM.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer")
tokenizer = AutoTokenizer.from_pretrained("./workspace/qwen3-8m-falcon-tokenizer")

# Generate text
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Batch processing (start small)
texts = ["Hello", "How are you", "Good morning"]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=20)

Important Notes

  • Model uses Qwen3 architecture with Falcon tokenizer (32K vocabulary)
  • All token IDs must be < 32768 to avoid CUDA errors
  • Start with small batch sizes (1-4) and gradually increase
  • Use proper padding to prevent dimension mismatches
  • Model initialized with random weights - requires fine-tuning
  • Compatible with Qwen3 APIs but uses Falcon vocabulary
Downloads last month
8
Safetensors
Model size
2.18M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support