File size: 927 Bytes
9455673 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# Qwen3 Model with Falcon Tokenizer - Usage Guide
## Model Details
- **Architecture**: Qwen3 (Grouped Query Attention, RMS Norm, Q/K Norm, RoPE)
- **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocabulary)
- **Special Tokens**:
- BOS: <|begin_of_text|> (id: 17)
- EOS: <|end_of_text|> (id: 11)
- PAD: <|pad|> (id: 0)
## Important Notes
1. This model combines Qwen3 architecture with Falcon tokenizer
2. The vocabulary size is 32K (Falcon standard)
3. Model uses Qwen3-specific features like q_norm/k_norm layers
4. All token IDs should be within 0-32767 range
## Batch Processing Tips
- Use conservative batch sizes (start with 1-4)
- Ensure all sequences are properly padded
- Monitor CUDA memory usage
- Use torch.no_grad() for inference
## Common Issues
- If CUDA errors occur, check token IDs are < 32768
- Ensure proper padding with <|pad|> token
- Use consistent tokenization for batches
|