ELYZA-Diffusion-Instruct-1.0-Dream-7B

image

Model Description

ELYZA-Diffusion-Instruct-1.0-Dream-7B is a Japanese-adapted diffusion language model released by ELYZA, Inc. It is based on the open-source diffusion LLM Dream-v0-Instruct-7B, and further pretrained and instruction-tuned on large-scale Japanese data.

The model follows a Discrete Diffusion Masked Language Model (DDMLM) formulation, where text generation is performed via iterative denoising starting from an all-MASK sequence.

In addition to Japanese continued pretraining, this model has undergone instruction tuning, enabling improved instruction-following and conversational behavior in Japanese.

For more details on the model design and training setup, please refer to our technical blog post.

Training

  • Initialization: Dream-v0-Instruct-7B
  • Continued pretraining on Japanese text (~62B tokens, approximate)
  • Instruction tuning on Japanese instruction data (~1.8B tokens for 10 epochs)

Usage

import torch
from transformers import AutoModel, AutoTokenizer

model_path = "elyza/ELYZA-Diffusion-Instruct-1.0-Dream-7B"

model = AutoModel.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda").eval()

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "ไป•ไบ‹ใฎ็†ฑๆ„ใ‚’ๅ–ใ‚Šๆˆปใ™ใŸใ‚ใฎใ‚ขใ‚คใƒ‡ใ‚ขใ‚’5ใคๆŒ™ใ’ใฆใใ ใ•ใ„ใ€‚"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    return_dict=True,
    add_generation_prompt=True,
)

input_ids = inputs.input_ids.to("cuda")
attention_mask = inputs.attention_mask.to("cuda")

with torch.no_grad():
    output = model.diffusion_generate(
        input_ids,
        attention_mask=attention_mask,
        max_new_tokens=512,
        steps=256,
        temperature=0.5,
        top_p=0.95,
        alg="entropy",
        alg_temp=0.5
    )

generated = tokenizer.decode(
    output.sequences[0][input_ids.size(1):],
    skip_special_tokens=True,
)

print(generated)

When using a smaller number of diffusion steps (e.g., up to 8x reduction), we recommend setting temperature and alg_temp to 0.5 or higher to maintain generation diversity and stability.

How to Cite

@misc
{elyza2026dllm,
title = {elyza/ELYZA-Diffusion-Base-1.0-Dream-7B},
url = {https://huggingface.co/elyza/ELYZA-Diffusion-Base-1.0-Dream-7B},
author = {Tasavat Trisitichoke and Akira Sasaki and Congda Ma and Ryosuke Nakamoto and Satoshi Tohda and Shoetsu Sato and Masato Hirakawa},
year = {2026}
}

Citations

@article
{ye2025dream,
title={Dream 7B: Diffusion Large Language Models},
author={Ye, Jiacheng and Xie, Zhihui and Zheng, Lin and Gao, Jiahui and Wu, Zirui and Jiang, Xin and Li, Zhenguo and Kong, Lingpeng},
journal={arXiv preprint arXiv:2508.15487},
year={2025}
}
Downloads last month
4
Safetensors
Model size
8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for elyza/ELYZA-Diffusion-Instruct-1.0-Dream-7B

Finetuned
(1)
this model

Paper for elyza/ELYZA-Diffusion-Instruct-1.0-Dream-7B