Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling Paper • 2502.14553 • Published Feb 20, 2025 • 1