Inference Providers documentation
NeMo Data Designer
NeMo Data Designer
DataDesigner is NVIDIA NeMo’s framework for generating high-quality synthetic datasets using LLMs. It enables you to create diverse data using statistical samplers, LLMs, or existing seed datasets while maintaining control over field relationships and data quality.
Overview
DataDesigner supports OpenAI-compatible endpoints, making it easy to use any model available through Hugging Face Inference Providers for synthetic data generation.
Prerequisites
- DataDesigner installed (
pip install data-designer) - A Hugging Face account with API token (needs “Make calls to Inference Providers” permission)
Configuration
1. Set your HF token
export HF_TOKEN="hf_your_token_here"2. Configure HF as a provider
from data_designer.essentials import (
CategorySamplerParams,
DataDesigner,
DataDesignerConfigBuilder,
LLMTextColumnConfig,
ModelConfig,
ModelProvider,
SamplerColumnConfig,
SamplerType,
)
# Define HF Inference Provider (OpenAI-compatible)
hf_provider = ModelProvider(
name="huggingface",
endpoint="/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fv1%26quot%3B%3C%2Fspan%3E%2C
provider_type="openai",
api_key="HF_TOKEN", # Reads from environment variable
)
# Define a model available via HF Inference Providers
hf_model = ModelConfig(
alias="hf-gpt-oss",
model="openai/gpt-oss-120b",
provider="huggingface",
)
# Create DataDesigner with HF provider
data_designer = DataDesigner(model_providers=[hf_provider])
config_builder = DataDesignerConfigBuilder(model_configs=[hf_model])3. Generate synthetic data
# Add a sampler column
config_builder.add_column(
SamplerColumnConfig(
name="category",
sampler_type=SamplerType.CATEGORY,
params=CategorySamplerParams(
values=["Electronics", "Books", "Clothing"],
),
)
)
# Add an LLM-generated column
config_builder.add_column(
LLMTextColumnConfig(
name="product_name",
model_alias="hf-gpt-oss",
prompt="Generate a creative product name for a {{ category }} item.",
)
)
# Preview the generated data
preview = data_designer.preview(config_builder=config_builder, num_records=5)
preview.display_sample_record()
# Access the DataFrame
df = preview.dataset
print(df)Using Different Models
You can use any model available through Inference Providers. Simply update the model field:
# Use a different model
hf_model = ModelConfig(
alias="hf-olmo",
model="allenai/OLMo-3-7B-Instruct",
provider="huggingface",
)