File size: 927 Bytes
9455673
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Qwen3 Model with Falcon Tokenizer - Usage Guide

## Model Details
- **Architecture**: Qwen3 (Grouped Query Attention, RMS Norm, Q/K Norm, RoPE)
- **Tokenizer**: Falcon-H1-0.5B-Instruct (32K vocabulary)
- **Special Tokens**:
  - BOS: <|begin_of_text|> (id: 17)
  - EOS: <|end_of_text|> (id: 11)
  - PAD: <|pad|> (id: 0)

## Important Notes
1. This model combines Qwen3 architecture with Falcon tokenizer
2. The vocabulary size is 32K (Falcon standard)
3. Model uses Qwen3-specific features like q_norm/k_norm layers
4. All token IDs should be within 0-32767 range

## Batch Processing Tips
- Use conservative batch sizes (start with 1-4)
- Ensure all sequences are properly padded
- Monitor CUDA memory usage
- Use torch.no_grad() for inference



## Common Issues

- If CUDA errors occur, check token IDs are < 32768

- Ensure proper padding with <|pad|> token

- Use consistent tokenization for batches