For full information, go check out the Dr Tulu paper here.
DR Tulu 8B - MLX
This is DR Tulu 8B converted to MLX format for efficient inference on Apple Silicon hardware.
MLX Model Variants
All variants are optimized for Apple Silicon with different memory/performance trade-offs:
| Model | Precision | Model Size | Bits/Weight | Memory Usage | Performance | Download |
|---|---|---|---|---|---|---|
| DR-Tulu-8B-MLX-4bit | 4-bit quantized | ~4.3GB | 4.500 | Lower | 78.2 tok/s | π€ HF |
| DR-Tulu-8B-MLX-6bit | 6-bit quantized | ~6.2GB | 6.500 | Medium | 60.7 tok/s | π€ HF |
| DR-Tulu-8B-MLX-8bit | 8-bit quantized | ~8.1GB | 8.500 | Medium-High | 59.8 tok/s | π€ HF |
| DR-Tulu-8B-MLX-bf16 | bfloat16 (full) | ~15.3GB | 16.000 | High | 35.0 tok/s | π€ HF |
π₯ Key Features:
- Original Model: rl-research/DR-Tulu-8B
- Hardware Optimized: Apple Silicon (M1/M2/M3/M4/M5)
- Conversion Framework: mlx-lm
- Research-Grade Choice: bf16 provides maximum quality and capabilities with full precision
- All variants maintain core research reasoning capabilities
π₯ MLX Conversion Details:
- Original Model: rl-research/DR-Tulu-8B
- Conversion: MLX format with bfloat16 precision (research-grade full precision)
- Model Size: ~15.3GB (down from 16.4GB original)
- Hardware Used: Mac Studio with Apple M1 Ultra (20-core, 128GB unified memory)
- Conversion Framework: mlx-lm
- Performance: ~35 tokens/sec, 16.4GB memory usage
Hardware Requirements
| Variant | Minimum RAM | Recommended RAM | Storage |
|---|---|---|---|
| 4bit | 8GB | 16GB | 5GB |
| 6bit | 16GB | 24GB | 7GB |
| 8bit | 16GB | 32GB | 9GB |
| bf16 | 24GB | 32GB+ | 16GB |
Tested Hardware: Mac Studio with Apple M1 Ultra (20-core, 128GB unified memory)
MLX Quick Start
Command Line Interface
Install and run with uvx:
# Generate text (replace {VARIANT} with 4bit, 6bit, 8bit, or bf16)
uvx --from mlx-lm mlx_lm.generate --model Plurigrid/DR-Tulu-8B-MLX-{VARIANT} --prompt "What is categorical theory and how does it apply to computer science?" --max-tokens 200
# Interactive chat
uvx --from mlx-lm mlx_lm.chat --model Plurigrid/DR-Tulu-8B-MLX-{VARIANT}
Python API
from mlx_lm import load, generate
# Load model (replace {VARIANT} with 4bit, 6bit, 8bit, or bf16)
model, tokenizer = load("Plurigrid/DR-Tulu-8B-MLX-{VARIANT}")
prompt = "What is categorical theory and how does it apply to computer science?"
# Apply chat template if available
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
# Generate response
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)
Installation for Python API:
pip install mlx-lm
# or with uv
uv add mlx-lm
Advanced Usage:
# For research tasks with step-by-step reasoning
prompt = "Analyze the relationship between category theory and functional programming. Think step by step."
# Multi-turn conversation
messages = [
{"role": "user", "content": "What is category theory?"},
{"role": "assistant", "content": "Category theory is a mathematical framework..."},
{"role": "user", "content": "How does it apply to computer science?"}
]
if tokenizer.chat_template is not None:
formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=500)
About DR Tulu
This is the RL checkpoint of DR Tulu, an open deep research agent trained on top of rl-research/DR-Tulu-SFT-8B.
This model has undergone RL training on this dataset. For more details on DR Tulu please read our paper!
Inference and Usage
Note: The original model was trained for tool-use using the dr-agent-lib framework. This MLX version provides general inference capabilities optimized for Apple Silicon.
For advanced tool-use functionality, see our github or check out our demo!
Evaluation Results
Results from the original DR-Tulu-8B model:
| Benchmark | SQAv2 | HealthBench | ResearchQA | DeepResearch Bench | SimpleQA | 2Wiki | WebWalker | Average |
|---|---|---|---|---|---|---|---|---|
| Qwen3-8B (naive rag) | 40.4 | 16.5 | 56.1 | 33.3 | 52.6 | 18.9 | 8.8 | 32.4 |
| Qwen3-8B (our search pipeline) | 57.2 | 5.9 | 46.3 | 18.2 | 70.5 | 44.0 | 27.9 | 38.6 |
| DR-Tulu-SFT-8B | 72.3 | 38.1 | 68.5 | 39.0 | 75.5 | 66.5 | 31.9 | 56.0 |
| DR-Tulu-8B (original) | 86.7 | 43.7 | 71.1 | 41.8 | 80.1 | 68.0 | 39.1 | 61.5 |
For more baselines, explanations of this table, and analysis of results, check out the Dr Tulu paper!
Intended uses & limitations
This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.
MLX-specific considerations:
- Optimized for Apple Silicon hardware only
- bf16 precision maintains full model quality - research-grade full precision choice
- Full precision preserved with minimal quality loss
- Reasoning capabilities fully preserved across all variants
Training
The script used to train the original model can be found here.
For hyperparameter details, check out the Dr Tulu paper.
Links
- π DR Tulu Paper
- βοΈ DR Tulu demo
- π» DR Tulu code
- π€ DR Tulu collection
- π Original model
- β‘ MLX framework
Citation
@article{drtulu,
title = {{DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research}},
author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Sam Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}},
year = {2025},
}
Conversion Details
- Date: November 22, 2024
- Converter: MLX community
- Command:
uvx --from mlx-lm mlx_lm.convert --hf-path rl-research/DR-Tulu-8B --mlx-path ./DR-Tulu-8B-bf16 - Precision: bfloat16 (full precision MLX conversion)
- Hardware: Mac Studio, Apple M1 Ultra (20-core CPU, 128GB unified memory)
- OS: macOS Sequoia 15.2 (Darwin 25.2.0)
- Framework Version: mlx-lm latest (November 2024)
- Downloads last month
- 43
Model tree for Plurigrid/DR-Tulu-8B-MLX-bf16
Base model
Qwen/Qwen3-8B-Base