MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Abstract
MoReBench and MoReBench-Theory provide benchmarks for evaluating AI's moral reasoning and decision-making processes, highlighting the need for process-focused evaluation and transparency in AI systems.
As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.
Community
Large scale, rubric-based moral reasoning benchmark w/ over 50 PhDs in morality. Collab w/ Scale AI
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- The Morality of Probability: How Implicit Moral Biases in LLMs May Shape the Future of Human-AI Symbiosis (2025)
- One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning (2025)
- Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning (2025)
- From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models (2025)
- HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants (2025)
- CogniAlign: Survivability-Grounded Multi-Agent Moral Reasoning for Safe and Transparent AI (2025)
- Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper