Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
imamnurby 's Collections
Long Sequences for LLM
Attention in LLM
LLM for Codes
LLM Benchmark
Graph Neural Network
MoE
LLM Security
General Purpose LLM
Continual Training
Pretraining
Model Merging
Chain of Thought
Instruction Tuning
Code Benchmark

Attention in LLM

updated Mar 16, 2024
Upvote
-

  • Simple linear attention language models balance the recall-throughput tradeoff

    Paper • 2402.18668 • Published Feb 28, 2024 • 20

  • BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

    Paper • 2403.09347 • Published Mar 14, 2024 • 22
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs