RND1-Base-0910
RND1 is an experimental diffusion language model with 30B parameters and 3B active parameters per token (sparse Mixture-of-Experts). This model was converted from a pretrained autoregressive base to enable diffusion-based text generation.
Model Overview
RND1-Base-0910 has the following features:
- Type: Diffusion Language Model
- Number of Parameters: 30.5B total, 3.3B activated per token
- Architecture: Sparse Mixture-of-Experts
- Training: Converted from pretrained autoregressive base (Qwen3-30BA3B)
For more details, see:
- Code: https://github.com/RadicalNumerics/RND1
- Report: https://www.radicalnumerics.ai/assets/rnd1_report.pdf
- Blog: https://www.radicalnumerics.ai/blog/rnd1
Note: RND1-Base-0910 has not been post-trained. Expect occasional repetition with greedy samplers.
Installation
pip install torch transformers accelerate numpy rich
For faster inference with optimized MoE kernels:
pip install flashinfer-python
pip install sglang[all]
pip install vllm
Selecting a non-Huggingface MoE backend is highly encouraged for faster generation. Note however that non-HF backends currently support a single GPU only, so you need to set e.g.
export CUDA_VISIBLE_DEVICES=0before running the script. If you useflashinfer-python, JIT compilation the first time the code is run may take a while unlessflashinfer-jit-cacheis installed.
Quick Start
from transformers import AutoTokenizer, AutoModelForMaskedLM
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("radicalnumerics/RND1-Base-0910", trust_remote_code=True)
# Load model
model = AutoModelForMaskedLM.from_pretrained(
"radicalnumerics/RND1-Base-0910",
dtype="bfloat16",
device_map="auto",
trust_remote_code=True,
moe_backend="vllm", # hf, sglang, vllm, flashinfer
)
# Generate - Task mode (for instructions and questions)
prompt = "Write a Python function that finds the longest common subsequence of two strings. Include comments explaining the algorithm."
inputs = tokenizer(f"Question: {prompt}\nAnswer:", return_tensors="pt")
input_ids = inputs.input_ids.to(model.device)
# Generate
output = model.generate(
inputs=input_ids,
max_new_tokens=256,
num_diffusion_steps=256,
temperature=0.01,
)
# Decode only the generated part
text = tokenizer.decode(output[0], skip_special_tokens=True)
print(text)
Generation Parameters
Key parameters for text generation:
max_new_tokens: Number of tokens to generate (default: 256)num_diffusion_steps: Diffusion denoising steps (default: 256)temperature: Sampling temperature, 0.0 for greedy (default: 0.0)top_k: Top-k filtering for samplingtop_p: Nucleus filtering for sampling
Generation Modes
Task Mode (default): For instructions, questions, or requests. Add "Question:" prefix to your prompt.
Completion Mode: For text continuation. Use prompt directly without prefix.
# Completion mode example
prompt = "The key to understanding quantum computing lies in"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
inputs=inputs.input_ids,
max_new_tokens=256,
num_diffusion_steps=256,
temperature=0.01,
)
Command-Line Interface
Following the Github repo's demo script demo_rnd_generation.py:
# Task mode (default) - for instructions, questions, or requests
python demo_rnd_generation.py --prompt "Write a Python function that finds the longest common subsequence of two strings. Include comments explaining the algorithm." --moe_backend hf
# Completion mode - for text continuation
python demo_rnd_generation.py --mode completion --prompt "The key to understanding quantum computing lies in" --moe_backend hf
# Sampling parameters
python demo_rnd_generation.py --top_k 50 --temperature 0.7 --prompt "Explain how neural networks learn in simple terms" --moe_backend hf
Technical Details
RND1 uses a diffusion process for text generation, iteratively denoising random tokens over multiple steps. This approach differs from traditional autoregressive generation and enables parallel token generation within each diffusion step.
The model architecture is based on a sparse Mixture-of-Experts design, activating only a subset of parameters for each token to balance computational efficiency with model capacity.
Citation
If you use RND1 in your research, please cite:
@misc{rnd1-report,
title={Training Diffusion Language Models at Scale using Autoregressive Models},
author={Radical Numerics},
year={2025},
}
- Downloads last month
- 2,276