# Qwen3 Model with Falcon Tokenizer - Usage Guide

## Model Details
- **Architecture**: Qwen3 (Grouped Query Attention, RMS Norm, Q/K Norm, RoPE)
- **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocabulary)
- **Special Tokens**:
  - BOS: <|begin_of_text|> (id: 17)
  - EOS: <|end_of_text|> (id: 11)
  - PAD: <|pad|> (id: 0)

## Important Notes
1. This model combines Qwen3 architecture with Falcon tokenizer
2. The vocabulary size is 32K (Falcon standard)
3. Model uses Qwen3-specific features like q_norm/k_norm layers
4. All token IDs should be within 0-32767 range

## Batch Processing Tips
- Use conservative batch sizes (start with 1-4)
- Ensure all sequences are properly padded
- Monitor CUDA memory usage
- Use torch.no_grad() for inference

## Common Issues
- If CUDA errors occur, check token IDs are < 32768
- Ensure proper padding with <|pad|> token
- Use consistent tokenization for batches