Curating datasets directly on the Hub

Community Article Published November 27, 2025

You can now edit datasets directly on the Hub. This is huge - no more download/edit/upload cycles for fixes and quick data curation. It's early days, but this will fundamentally change dataset workflows for AI.

The most interesting part, in my opinion, is collaborative dataset curation. Multiple people across your organization can make commits to the same dataset, review changes, and improve data quality together - all with full versioning and traceability.

In this post, I'll walk you through how it works with a practical example.

Requirements

Currently, you can edit datasets on the Hub if:

The dataset contains a single CSV file (more formats coming).
You have write access (your personal datasets or any org/dataset where you have write permissions).
It has textual (string) columns.

Walkthrough: fixing dataset errors

Say you or your team published a sentiment analysis dataset, and someone spots errors. Here's how you fix them.

Go to the dataset page.
Go to Data Studio to inspect the dataset. For example, in the screenshot below, you can spot an error in the label distribution, with some values negativ instead of negative.

Data Studio showing value distribution — The value distribution shows three values when only two are expected

Filtered view showing typo — Filtering confirms there's a typo in some label names

If you have write access, you will see a Toggle Edit Mode button. If you click, you will be able to edit individual cells in string columns like in the screenshot below:

Edit mode interface — Editing individual cells in the dataset

Once you're happy with your edits, click on Commit to submit your changes. This will commit your changes to the dataset repo and let you define a descriptive commit message:

Commit interface showing changes — Ready to commit two changes

Commit message dialog — Adding a descriptive commit message

This is the resulting change in the dataset, which lets you trace back to all curation actions:

Dataset history view — Changes are versioned in the dataset history

Once you're done with a round of edits, you can make more changes iteratively. Let's say you identify mislabelled examples (e.g., positive instead of negative), you just need to edit the target cells and commit your changes with a new message:

Mislabeled examples highlighted — Several examples are mislabeled

Correcting labels — Correcting the mislabeled examples

Committing label corrections — Committing the label corrections

Final applied changes — Applied changes are visible in the dataset versioning

What's next

This is just the beginning for dataset curation on the Hub. The team is actively working on what comes next. I'm personally excited to see how AI models can help you curate data faster and better directly on the Hub and in your browser. Stay tuned.

And get involved!

Try it out and share your feedback. Leave a comment on this blog post.

Community

merve

about 8 hours ago

😍😍

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote