rbgo (Rajdeep Borgohain)

upvoted an article 2 months ago

Article

Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders

Jul 16

• 74

upvoted a paper 3 months ago

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16 • 42

upvoted an article 4 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

Jul 8

• 701

upvoted a collection 7 months ago

Gemma 3 Release

Collection

28 items • Updated Aug 11 • 522

upvoted an article 7 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12

• 467

upvoted 2 articles 8 months ago

Article

Inside the family of Smol models

By

and 1 other •

Feb 27

• 13

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

• 420

upvoted a paper 8 months ago

Kanana: Compute-efficient Bilingual Language Models

Paper • 2502.18934 • Published Feb 26 • 65

upvoted a collection 8 months ago

Phi-4

Collection

Phi-4 family of small language, multi-modal and reasoning models. • 17 items • Updated Jul 10 • 186

upvoted a paper 8 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 207

upvoted a paper 9 months ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 153

upvoted 2 articles 9 months ago

Article

Mastering Long Contexts in LLMs with KVPress

By

and 1 other •

Jan 23

• 70

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 882

upvoted 4 collections 9 months ago

upvoted a paper 9 months ago

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

Paper • 2412.06071 • Published Dec 8, 2024 • 9

upvoted an article 9 months ago

Article

Timm ❤️ Transformers: Use any timm model with transformers

Jan 16

• 51

upvoted a paper 10 months ago

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 121

Rajdeep Borgohain

AI & ML interests

Organizations

Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

SmolLM3: smol, multilingual, long-context reasoner

Gemma 3 Release

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Inside the family of Smol models

SmolLM - blazingly fast and remarkably powerful

Kanana: Compute-efficient Bilingual Language Models

Phi-4

Qwen2.5-VL Technical Report

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Mastering Long Contexts in LLMs with KVPress

Open-R1: a fully open reproduction of DeepSeek-R1

Qwen2.5-VL

Qwen2.5-1M

DeepSeek-V2

DeepSeek-LLM

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

Timm ❤️ Transformers: Use any timm model with transformers

Phi-4 Technical Report

Rajdeep Borgohain

AI & ML interests

Organizations

rbgo's activity

Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders

SmolLM3: smol, multilingual, long-context reasoner

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Inside the family of Smol models

SmolLM - blazingly fast and remarkably powerful

Mastering Long Contexts in LLMs with KVPress

Open-R1: a fully open reproduction of DeepSeek-R1

Timm ❤️ Transformers: Use any timm model with transformers