FlagEval

non-profit

https://flageval.baai.ac.cn/

AI & ML interests

None defined yet.

Recent Activity

xuanricheng authored a paper about 1 month ago

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

philokey authored a paper about 1 month ago

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

philokey authored a paper about 1 month ago

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

View all activity

FlagEval 's Spaces 2

FlagEval-Arena

Arena

FlagEval-Debate

Display a debate interface