Curating datasets directly on the Hub

Community Article Published November 27, 2025

You can now edit datasets directly on the Hub. This is huge - no more download/edit/upload cycles for fixes and quick data curation. It's early days, but this will fundamentally change dataset workflows for AI.

The most interesting part, in my opinion, is collaborative dataset curation. Multiple people across your organization can make commits to the same dataset, review changes, and improve data quality together - all with full versioning and traceability.

In this post, I'll walk you through how it works with a practical example.

Requirements

Currently, you can edit datasets on the Hub if:

  1. The dataset contains a single CSV file (more formats coming).
  2. You have write access (your personal datasets or any org/dataset where you have write permissions).
  3. It has textual (string) columns.

Walkthrough: fixing dataset errors

Say you or your team published a sentiment analysis dataset, and someone spots errors. Here's how you fix them.

  1. Go to the dataset page.

  2. Go to Data Studio to inspect the dataset. For example, in the screenshot below, you can spot an error in the label distribution, with some values negativ instead of negative.

Data Studio showing value distribution
The value distribution shows three values when only two are expected
Filtered view showing typo
Filtering confirms there's a typo in some label names
  1. If you have write access, you will see a Toggle Edit Mode button. If you click, you will be able to edit individual cells in string columns like in the screenshot below:
Edit mode interface
Editing individual cells in the dataset
  1. Once you're happy with your edits, click on Commit to submit your changes. This will commit your changes to the dataset repo and let you define a descriptive commit message:
Commit interface showing changes
Ready to commit two changes
Commit message dialog
Adding a descriptive commit message

This is the resulting change in the dataset, which lets you trace back to all curation actions:

Dataset history view
Changes are versioned in the dataset history
  1. Once you're done with a round of edits, you can make more changes iteratively. Let's say you identify mislabelled examples (e.g., positive instead of negative), you just need to edit the target cells and commit your changes with a new message:
Mislabeled examples highlighted
Several examples are mislabeled
Correcting labels
Correcting the mislabeled examples
Committing label corrections
Committing the label corrections
Final applied changes
Applied changes are visible in the dataset versioning

What's next

This is just the beginning for dataset curation on the Hub. The team is actively working on what comes next. I'm personally excited to see how AI models can help you curate data faster and better directly on the Hub and in your browser. Stay tuned.

And get involved!

Try it out and share your feedback. Leave a comment on this blog post.

Community

๐Ÿ˜๐Ÿ˜

Sign up or log in to comment