5 29 14

Weigao Sun

weigao266

https://weigao266.github.io/

AI & ML interests

Algo & MLSys

Recent Activity

authored a paper 18 days ago

Native Hybrid Attention for Efficient Sequence Modeling

upvoted a paper 18 days ago

Native Hybrid Attention for Efficient Sequence Modeling

commented on a paper 18 days ago

Native Hybrid Attention for Efficient Sequence Modeling

View all activity

Organizations

authored a paper 18 days ago

Native Hybrid Attention for Efficient Sequence Modeling

Paper • 2510.07019 • Published 19 days ago • 16

authored 3 papers 2 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 254

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models

Paper • 2505.20767 • Published May 27 • 1

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Paper • 2508.09834 • Published Aug 13 • 53

authored a paper 7 months ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published Mar 27 • 42

authored 6 papers 9 months ago

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

Paper • 2401.16265 • Published Jan 29, 2024 • 1

Linear Attention Sequence Parallelism

Paper • 2404.02882 • Published Apr 3, 2024 • 3

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Paper • 2405.17381 • Published May 27, 2024

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Paper • 2411.15708 • Published Nov 24, 2024

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Paper • 2502.07563 • Published Feb 11 • 24

authored 2 papers over 1 year ago

Scaling Laws for Linear Complexity Language Models

Paper • 2406.16690 • Published Jun 24, 2024 • 23

HGRN2: Gated Linear RNNs with State Expansion

Paper • 2404.07904 • Published Apr 11, 2024 • 20

authored a paper almost 2 years ago

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9, 2024 • 27

authored a paper about 2 years ago

Scaling TransNormer to 175 Billion Parameters

Paper • 2307.14995 • Published Jul 27, 2023 • 22

Weigao Sun

AI & ML interests

Recent Activity

Organizations

weigao266's activity