test46 / USAGE_GUIDE.md
C10X's picture
Upload 9 files
9455673 verified

Qwen3 Model with Falcon Tokenizer - Usage Guide

Model Details

  • Architecture: Qwen3 (Grouped Query Attention, RMS Norm, Q/K Norm, RoPE)
  • Tokenizer: Falcon-H1-0.5B-Instruct (32K vocabulary)
  • Special Tokens:
    • BOS: <|begin_of_text|> (id: 17)
    • EOS: <|end_of_text|> (id: 11)
    • PAD: <|pad|> (id: 0)

Important Notes

  1. This model combines Qwen3 architecture with Falcon tokenizer
  2. The vocabulary size is 32K (Falcon standard)
  3. Model uses Qwen3-specific features like q_norm/k_norm layers
  4. All token IDs should be within 0-32767 range

Batch Processing Tips

  • Use conservative batch sizes (start with 1-4)
  • Ensure all sequences are properly padded
  • Monitor CUDA memory usage
  • Use torch.no_grad() for inference

Common Issues

  • If CUDA errors occur, check token IDs are < 32768
  • Ensure proper padding with <|pad|> token
  • Use consistent tokenization for batches