9 8 17

Cody Steinmetz PRO

codys12

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago

moonshotai/Kimi-Linear-48B-A3B-Instruct

upvoted an article 3 months ago

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

published an article 3 months ago

The Hacker's Guide to Building an AI Supercluster

View all activity

Organizations

upvoted an article 3 months ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

Jun 3

•

upvoted an article 7 months ago

Article

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

May 21

•

upvoted an article 10 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

•

272

upvoted a paper 10 months ago

Optimizing Large Language Model Training Using FP4 Quantization

Paper • 2501.17116 • Published Jan 28 • 37

upvoted a collection about 1 year ago

Mamba2-In-Llama3

Collection

Mamba2 distilled from Llama3 8B instruct. The Mamba in the Llama: Distilling and Accelerating Hybrid Models (https://arxiv.org/abs/2408.15237). • 4 items • Updated Sep 9, 2024 • 2

upvoted a paper about 1 year ago

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 42

upvoted an article over 1 year ago

Article

4D masks support in Transformers

Jan 8, 2024

•

upvoted a paper about 2 years ago

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 27

Cody Steinmetz PRO

AI & ML interests

Recent Activity

Organizations

codys12's activity

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

4D masks support in Transformers