Upload 9 files
3ef0a41
verified
Qwen3 Model with Falcon Tokenizer - Usage Guide
Model Details
- Architecture: Qwen3 (Grouped Query Attention, RMS Norm, Q/K Norm, RoPE)
- Tokenizer: Falcon-H1-0.5B-Instruct (32K vocabulary)
- Special Tokens:
- BOS: <|begin_of_text|> (id: 17)
- EOS: <|end_of_text|> (id: 11)
- PAD: <|pad|> (id: 0)
Important Notes
- This model combines Qwen3 architecture with Falcon tokenizer
- The vocabulary size is 32K (Falcon standard)
- Model uses Qwen3-specific features like q_norm/k_norm layers
- All token IDs should be within 0-32767 range
Batch Processing Tips
- Use conservative batch sizes (start with 1-4)
- Ensure all sequences are properly padded
- Monitor CUDA memory usage
- Use torch.no_grad() for inference
Common Issues
- If CUDA errors occur, check token IDs are < 32768
- Ensure proper padding with <|pad|> token
- Use consistent tokenization for batches