For full information, go check out the Dr Tulu paper here.

DR Tulu 8B - MLX

This is DR Tulu 8B converted to MLX format for efficient inference on Apple Silicon hardware.

MLX Model Variants

All variants are optimized for Apple Silicon with different memory/performance trade-offs:

Model	Precision	Model Size	Bits/Weight	Memory Usage	Performance	Download
DR-Tulu-8B-MLX-4bit	4-bit quantized	~4.3GB	4.500	Lower	78.2 tok/s	🤗 HF
DR-Tulu-8B-MLX-6bit	6-bit quantized	~6.2GB	6.500	Medium	60.7 tok/s	🤗 HF
DR-Tulu-8B-MLX-8bit	8-bit quantized	~8.1GB	8.500	Medium-High	59.8 tok/s	🤗 HF
DR-Tulu-8B-MLX-bf16	bfloat16 (full)	~15.3GB	16.000	High	35.0 tok/s	🤗 HF

🔥 Key Features:

Original Model: rl-research/DR-Tulu-8B
Hardware Optimized: Apple Silicon (M1/M2/M3/M4/M5)
Conversion Framework: mlx-lm
Research-Grade Choice: bf16 provides maximum quality and capabilities with full precision
All variants maintain core research reasoning capabilities

🔥 MLX Conversion Details:

Original Model: rl-research/DR-Tulu-8B
Conversion: MLX format with bfloat16 precision (research-grade full precision)
Model Size: ~15.3GB (down from 16.4GB original)
Hardware Used: Mac Studio with Apple M1 Ultra (20-core, 128GB unified memory)
Conversion Framework: mlx-lm
Performance: ~35 tokens/sec, 16.4GB memory usage

Hardware Requirements

Variant	Minimum RAM	Recommended RAM	Storage
4bit	8GB	16GB	5GB
6bit	16GB	24GB	7GB
8bit	16GB	32GB	9GB
bf16	24GB	32GB+	16GB

Tested Hardware: Mac Studio with Apple M1 Ultra (20-core, 128GB unified memory)

MLX Quick Start

Command Line Interface

Install and run with uvx:

# Generate text (replace {VARIANT} with 4bit, 6bit, 8bit, or bf16)
uvx --from mlx-lm mlx_lm.generate --model Plurigrid/DR-Tulu-8B-MLX-{VARIANT} --prompt "What is categorical theory and how does it apply to computer science?" --max-tokens 200

# Interactive chat
uvx --from mlx-lm mlx_lm.chat --model Plurigrid/DR-Tulu-8B-MLX-{VARIANT}

Python API

from mlx_lm import load, generate

# Load model (replace {VARIANT} with 4bit, 6bit, 8bit, or bf16)
model, tokenizer = load("Plurigrid/DR-Tulu-8B-MLX-{VARIANT}")

prompt = "What is categorical theory and how does it apply to computer science?"

# Apply chat template if available
if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

# Generate response
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)

Installation for Python API:

pip install mlx-lm
# or with uv
uv add mlx-lm

Advanced Usage:

# For research tasks with step-by-step reasoning
prompt = "Analyze the relationship between category theory and functional programming. Think step by step."

# Multi-turn conversation
messages = [
    {"role": "user", "content": "What is category theory?"},
    {"role": "assistant", "content": "Category theory is a mathematical framework..."},
    {"role": "user", "content": "How does it apply to computer science?"}
]

if tokenizer.chat_template is not None:
    formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
    response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=500)

About DR Tulu

This is the RL checkpoint of DR Tulu, an open deep research agent trained on top of rl-research/DR-Tulu-SFT-8B.

This model has undergone RL training on this dataset. For more details on DR Tulu please read our paper!

Inference and Usage

Note: The original model was trained for tool-use using the dr-agent-lib framework. This MLX version provides general inference capabilities optimized for Apple Silicon.

For advanced tool-use functionality, see our github or check out our demo!

Evaluation Results

Results from the original DR-Tulu-8B model:

Benchmark	SQAv2	HealthBench	ResearchQA	DeepResearch Bench	SimpleQA	2Wiki	WebWalker	Average
Qwen3-8B (naive rag)	40.4	16.5	56.1	33.3	52.6	18.9	8.8	32.4
Qwen3-8B (our search pipeline)	57.2	5.9	46.3	18.2	70.5	44.0	27.9	38.6
DR-Tulu-SFT-8B	72.3	38.1	68.5	39.0	75.5	66.5	31.9	56.0
DR-Tulu-8B (original)	86.7	43.7	71.1	41.8	80.1	68.0	39.1	61.5

For more baselines, explanations of this table, and analysis of results, check out the Dr Tulu paper!

Intended uses & limitations

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

MLX-specific considerations:

Optimized for Apple Silicon hardware only
bf16 precision maintains full model quality - research-grade full precision choice
Full precision preserved with minimal quality loss
Reasoning capabilities fully preserved across all variants

Training

The script used to train the original model can be found here.

For hyperparameter details, check out the Dr Tulu paper.

Citation

@article{drtulu,
  title = {{DR Tulu:  Reinforcement Learning with Evolving Rubrics for Deep Research}},
  author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Sam Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}},
  year = {2025},
}

Conversion Details

Date: November 22, 2024
Converter: MLX community
Command: uvx --from mlx-lm mlx_lm.convert --hf-path rl-research/DR-Tulu-8B --mlx-path ./DR-Tulu-8B-bf16
Precision: bfloat16 (full precision MLX conversion)
Hardware: Mac Studio, Apple M1 Ultra (20-core CPU, 128GB unified memory)
OS: macOS Sequoia 15.2 (Darwin 25.2.0)
Framework Version: mlx-lm latest (November 2024)

Downloads last month: 43

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Plurigrid/DR-Tulu-8B-MLX-bf16

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

rl-research/DR-Tulu-SFT-8B

Finetuned

rl-research/DR-Tulu-8B

Finetuned

(1)

this model

Plurigrid
/

DR-Tulu-8B-MLX-bf16

DR Tulu 8B - MLX

MLX Model Variants

Hardware Requirements

MLX Quick Start

Command Line Interface

Python API

About DR Tulu

Inference and Usage

Evaluation Results

Intended uses & limitations

Training

Links

Citation

Conversion Details

Model tree for Plurigrid/DR-Tulu-8B-MLX-bf16