Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.07413

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

Paper • 2404.15420 • Published Apr 23, 2024 • 11
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 126
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 258
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22, 2024 • 45

Foundation Models

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 66
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 60

Papers - Attention - Mixture of Attention Heads (MoA)

Generalized multi head using RoPE

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38

Papers - University - Princeton University

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy

Paper • 2404.05238 • Published Apr 8, 2024 • 3
Cognitive Architectures for Language Agents

Paper • 2309.02427 • Published Sep 5, 2023 • 8
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 63
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 93
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16, 2024 • 22
Instruction-tuned Language Models are Better Knowledge Learners

Paper • 2402.12847 • Published Feb 20, 2024 • 26
DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 29

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 93
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 111
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107

Papers - Megatron

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38

Papers - University - MIT

One-step Diffusion with Distribution Matching Distillation

Paper • 2311.18828 • Published Nov 30, 2023 • 3
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 82
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1, 2024 • 13
Locating and Editing Factual Associations in GPT

Paper • 2202.05262 • Published Feb 10, 2022 • 1

Crystalcareai/GemMoE-Beta-1

Text Generation • Updated Mar 20, 2024 • 101 • 79
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 60

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

Paper • 2404.15420 • Published Apr 23, 2024 • 11
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 126
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 258
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22, 2024 • 45

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 93
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16, 2024 • 22
Instruction-tuned Language Models are Better Knowledge Learners

Paper • 2402.12847 • Published Feb 20, 2024 • 26
DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 29

Foundation Models

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 66
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 60

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 93
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 111
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107

Papers - Attention - Mixture of Attention Heads (MoA)

Generalized multi head using RoPE

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38

Papers - Megatron

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38

Papers - University - Princeton University

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy

Paper • 2404.05238 • Published Apr 8, 2024 • 3
Cognitive Architectures for Language Agents

Paper • 2309.02427 • Published Sep 5, 2023 • 8
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2

Papers - University - MIT

One-step Diffusion with Distribution Matching Distillation

Paper • 2311.18828 • Published Nov 30, 2023 • 3
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 82
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1, 2024 • 13
Locating and Editing Factual Associations in GPT

Paper • 2202.05262 • Published Feb 10, 2022 • 1

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 63
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

Crystalcareai/GemMoE-Beta-1

Text Generation • Updated Mar 20, 2024 • 101 • 79
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 38
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 60

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs