Inference Providers documentation

NeMo Data Designer

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

NeMo Data Designer

DataDesigner is NVIDIA NeMo’s framework for generating high-quality synthetic datasets using LLMs. It enables you to create diverse data using statistical samplers, LLMs, or existing seed datasets while maintaining control over field relationships and data quality.

Overview

DataDesigner supports OpenAI-compatible endpoints, making it easy to use any model available through Hugging Face Inference Providers for synthetic data generation.

Prerequisites

  • DataDesigner installed (pip install data-designer)
  • A Hugging Face account with API token (needs “Make calls to Inference Providers” permission)

Configuration

1. Set your HF token

export HF_TOKEN="hf_your_token_here"

2. Configure HF as a provider

from data_designer.essentials import (
    CategorySamplerParams,
    DataDesigner,
    DataDesignerConfigBuilder,
    LLMTextColumnConfig,
    ModelConfig,
    ModelProvider,
    SamplerColumnConfig,
    SamplerType,
)

# Define HF Inference Provider (OpenAI-compatible)
hf_provider = ModelProvider(
    name="huggingface",
    endpoint="/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fv1%26quot%3B%3C%2Fspan%3E%2C
    provider_type="openai",
    api_key="HF_TOKEN",  # Reads from environment variable
)

# Define a model available via HF Inference Providers
hf_model = ModelConfig(
    alias="hf-gpt-oss",
    model="openai/gpt-oss-120b",
    provider="huggingface",
)

# Create DataDesigner with HF provider
data_designer = DataDesigner(model_providers=[hf_provider])
config_builder = DataDesignerConfigBuilder(model_configs=[hf_model])

3. Generate synthetic data

# Add a sampler column
config_builder.add_column(
    SamplerColumnConfig(
        name="category",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["Electronics", "Books", "Clothing"],
        ),
    )
)

# Add an LLM-generated column
config_builder.add_column(
    LLMTextColumnConfig(
        name="product_name",
        model_alias="hf-gpt-oss",
        prompt="Generate a creative product name for a {{ category }} item.",
    )
)

# Preview the generated data
preview = data_designer.preview(config_builder=config_builder, num_records=5)
preview.display_sample_record()

# Access the DataFrame
df = preview.dataset
print(df)

Using Different Models

You can use any model available through Inference Providers. Simply update the model field:

# Use a different model
hf_model = ModelConfig(
    alias="hf-olmo",
    model="allenai/OLMo-3-7B-Instruct",
    provider="huggingface",
)

Resources

Update on GitHub