Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2307.08691

Model Optimization

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Paper • 2407.08608 • Published Jul 11, 2024 • 1
1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 84

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 27
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 94
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 94
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 159
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 55

inference optimization

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Paper • 2407.08608 • Published Jul 11, 2024 • 1
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 9

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 94
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 18
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31, 2024 • 22

Model Optimization

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Paper • 2407.08608 • Published Jul 11, 2024 • 1
1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 84

inference optimization

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Paper • 2407.08608 • Published Jul 11, 2024 • 1
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 9

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 27
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 94
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 94
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 18
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31, 2024 • 22

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 94
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 159
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 55

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs