11 19 6

Furu Wei

thegenerality

AI & ML interests

None yet

Recent Activity

upvoted a paper 9 days ago

BitNet Distillation

upvoted a paper 29 days ago

Thinking Augmented Pre-training

authored a paper 30 days ago

Thinking Augmented Pre-training

View all activity

Organizations

None yet

authored a paper 30 days ago

Thinking Augmented Pre-training

Paper • 2509.20186 • Published Sep 24 • 22

authored a paper 5 months ago

Rectified Sparse Attention

Paper • 2506.04108 • Published Jun 4 • 10

authored 2 papers 6 months ago

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published Apr 25 • 47

BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published Apr 16 • 75

authored a paper 8 months ago

mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data

Paper • 2502.08468 • Published Feb 12 • 15

authored a paper 9 months ago

Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published Jan 24 • 58

authored 2 papers 10 months ago

GeAR: Generation Augmented Retrieval

Paper • 2501.02772 • Published Jan 6 • 22

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 48

authored 2 papers 11 months ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 69

MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 28

authored 3 papers about 1 year ago

authored 7 papers over 1 year ago

Autoregressive Speech Synthesis without Vector Quantization

Paper • 2407.08551 • Published Jul 11, 2024 • 17

Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28, 2024 • 22

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 95

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published Jun 8, 2024 • 19

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 60

MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5, 2024 • 17

Furu Wei

AI & ML interests

Recent Activity

Organizations

thegenerality's activity