# Qwen3 Model with Falcon Tokenizer - Usage Guide ## Model Details - **Architecture**: Qwen3 (Grouped Query Attention, RMS Norm, Q/K Norm, RoPE) - **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocabulary) - **Special Tokens**: - BOS: <|begin_of_text|> (id: 17) - EOS: <|end_of_text|> (id: 11) - PAD: <|pad|> (id: 0) ## Important Notes 1. This model combines Qwen3 architecture with Falcon tokenizer 2. The vocabulary size is 32K (Falcon standard) 3. Model uses Qwen3-specific features like q_norm/k_norm layers 4. All token IDs should be within 0-32767 range ## Batch Processing Tips - Use conservative batch sizes (start with 1-4) - Ensure all sequences are properly padded - Monitor CUDA memory usage - Use torch.no_grad() for inference ## Common Issues - If CUDA errors occur, check token IDs are < 32768 - Ensure proper padding with <|pad|> token - Use consistent tokenization for batches