Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
FlagEval
non-profit
https://flageval.baai.ac.cn/
Activity Feed
Follow
18
AI & ML interests
None defined yet.
Recent Activity
xuanricheng
authored
a paper
about 1 month ago
FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions
philokey
authored
a paper
about 1 month ago
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
philokey
authored
a paper
about 1 month ago
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
View all activity
Team members
11
spaces
2
Sort: Recently updated
Running
6
FlagEval-Arena
🐢
Arena
Running
12
FlagEval-Debate
🐠
Display a debate interface
models
1
FlagEval/flageval_judgemodel
Text Generation
•
33B
•
Updated
Dec 30, 2024
•
1
•
1
datasets
11
Sort: Recently updated
FlagEval/EmbodiedVerse-Bench
Viewer
•
Updated
Jun 25
•
2.04k
•
101
FlagEval/Where2Place
Viewer
•
Updated
May 29
•
100
•
89
FlagEval/SAT
Viewer
•
Updated
May 6
•
150
•
45
FlagEval/HMMT_2025
Viewer
•
Updated
May 6
•
30
•
36
FlagEval/ERQA
Viewer
•
Updated
Apr 22
•
400
•
349
•
2
FlagEval/sub_spatial
Viewer
•
Updated
Apr 21
•
690
•
6
FlagEval/EmbSpatial-Bench
Viewer
•
Updated
Apr 21
•
3.64k
•
106
•
2
FlagEval/coco_val2014_sampled
Viewer
•
Updated
Nov 21, 2024
•
1k
•
15
FlagEval/documentation-images
Viewer
•
Updated
Nov 13, 2024
•
3
•
155
FlagEval/CLCC_v1
Viewer
•
Updated
Jul 29, 2024
•
760
•
19
•
3
View 11 datasets