Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper β’ 2502.11089 β’ Published Feb 16 β’ 165