Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention Paper • 2509.23610 • Published Sep 28 • 13