C10X
/

test50

Model card Files Files and versions

test50 / USAGE_GUIDE.md

C10X's picture

Upload 9 files

3ef0a41 verified about 2 months ago

|

history blame contribute delete

927 Bytes

Qwen3 Model with Falcon Tokenizer - Usage Guide

Model Details

Architecture: Qwen3 (Grouped Query Attention, RMS Norm, Q/K Norm, RoPE)
Tokenizer: Falcon-H1-0.5B-Instruct (32K vocabulary)
Special Tokens:
- BOS: <|begin_of_text|> (id: 17)
- EOS: <|end_of_text|> (id: 11)
- PAD: <|pad|> (id: 0)

Important Notes

This model combines Qwen3 architecture with Falcon tokenizer
The vocabulary size is 32K (Falcon standard)
Model uses Qwen3-specific features like q_norm/k_norm layers
All token IDs should be within 0-32767 range

Batch Processing Tips

Use conservative batch sizes (start with 1-4)
Ensure all sequences are properly padded
Monitor CUDA memory usage
Use torch.no_grad() for inference

Common Issues

If CUDA errors occur, check token IDs are < 32768
Ensure proper padding with <|pad|> token
Use consistent tokenization for batches