ELYZA-Diffusion-Instruct-1.0-Dream-7B

Model Description

ELYZA-Diffusion-Instruct-1.0-Dream-7B is a Japanese-adapted diffusion language model released by ELYZA, Inc. It is based on the open-source diffusion LLM Dream-v0-Instruct-7B, and further pretrained and instruction-tuned on large-scale Japanese data.

The model follows a Discrete Diffusion Masked Language Model (DDMLM) formulation, where text generation is performed via iterative denoising starting from an all-MASK sequence.

In addition to Japanese continued pretraining, this model has undergone instruction tuning, enabling improved instruction-following and conversational behavior in Japanese.

For more details on the model design and training setup, please refer to our technical blog post.

Training

Initialization: Dream-v0-Instruct-7B
Continued pretraining on Japanese text (~62B tokens, approximate)
Instruction tuning on Japanese instruction data (~1.8B tokens for 10 epochs)

Usage

import torch
from transformers import AutoModel, AutoTokenizer

model_path = "elyza/ELYZA-Diffusion-Instruct-1.0-Dream-7B"

model = AutoModel.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda").eval()

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "仕事の熱意を取り戻すためのアイデアを5つ挙げてください。"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    return_dict=True,
    add_generation_prompt=True,
)

input_ids = inputs.input_ids.to("cuda")
attention_mask = inputs.attention_mask.to("cuda")

with torch.no_grad():
    output = model.diffusion_generate(
        input_ids,
        attention_mask=attention_mask,
        max_new_tokens=512,
        steps=256,
        temperature=0.5,
        top_p=0.95,
        alg="entropy",
        alg_temp=0.5
    )

generated = tokenizer.decode(
    output.sequences[0][input_ids.size(1):],
    skip_special_tokens=True,
)

print(generated)

When using a smaller number of diffusion steps (e.g., up to 8x reduction), we recommend setting temperature and alg_temp to 0.5 or higher to maintain generation diversity and stability.

How to Cite

@misc
{elyza2026dllm,
title = {elyza/ELYZA-Diffusion-Base-1.0-Dream-7B},
url = {https://huggingface.co/elyza/ELYZA-Diffusion-Base-1.0-Dream-7B},
author = {Tasavat Trisitichoke and Akira Sasaki and Congda Ma and Ryosuke Nakamoto and Satoshi Tohda and Shoetsu Sato and Masato Hirakawa},
year = {2026}
}

Citations

@article
{ye2025dream,
title={Dream 7B: Diffusion Large Language Models},
author={Ye, Jiacheng and Xie, Zhihui and Zheng, Lin and Gao, Jiahui and Wu, Zirui and Jiang, Xin and Li, Zhenguo and Kong, Lingpeng},
journal={arXiv preprint arXiv:2508.15487},
year={2025}
}

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for elyza/ELYZA-Diffusion-Instruct-1.0-Dream-7B

Base model

Dream-org/Dream-v0-Instruct-7B

Finetuned

elyza/ELYZA-Diffusion-Base-1.0-Dream-7B

Finetuned

(1)

this model

Paper for elyza/ELYZA-Diffusion-Instruct-1.0-Dream-7B

Dream 7B: Diffusion Large Language Models

Paper • 2508.15487 • Published Aug 21, 2025