MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes Paper • 2510.16380 • Published 8 days ago • 2
CulturalBench Collection A Robust, Diverse and Challegning Benchmark for Measuring Cultural Knowledge of LLMs • 6 items • Updated Jun 6
Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas Paper • 2505.14633 • Published May 20 • 3
Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas Paper • 2505.14633 • Published May 20 • 3 • 2