AblationBench - a ai-coscientist Collection

ai-coscientist 's Collections

updated May 16

This is a collection of datasets used to evaluate language models in the task of ablation planning in empirical AI research.

ai-coscientist/researcher-ablation-bench

Viewer • Updated May 15 • 83 • 44

Note ResearcherAblationBench is a benchmark aim to assist authors in proposing ablation plans based on a paper's written method section. It contains 83 AI conference papers, alongside human-annotated ablation found in the original papers.
ai-coscientist/reviewer-ablation-bench

Viewer • Updated May 15 • 6.26k • 15

Note ReviewerAblationBench is a benchmark aim to assist reviewers in finding missing ablation experiments from a paper's submission. It contains 350 ICLR submissions from 2023-2025, alongside official reviews that contain suggestions for missing ablation experiments.
ai-coscientist/researcher-ablation-judge-eval

Viewer • Updated May 15 • 63 • 20

Note ResearcherAblationJudgeEval is a benchmark aim to support the automatic evaluation framework using LM judges for ResearcherAblationBench. It contains 63 ablation plans, alongside human annotation for whether they are found in the ablations from the original papers.
ai-coscientist/reviewer-ablation-judge-eval

Viewer • Updated May 15 • 60 • 4

Note ReviewerAblationJudgeEval is a benchmark aim to support the automatic evaluation framework using LM judges for ReviewerAblationBench. It contains 60 missing ablation plans, alongside human annotation for whether they are found in one of the reviewers' official comments.