Open FlagEval-VLM Leaderboard

欢迎使用Open FlagEval-VLM Leaderboard! Open FlagEval-VLM Leaderboard 旨在跟踪、排名和评估开放式视觉大语言模型(VLM)。本排行榜由FlagEval平台提供相应算力和运行环境。VLM构建了一种基于数据集的能力体系,依据所接入的开源数据集,我们总结出了数学,视觉、图表、通用、文字以及中文等六个能力维度,由此组成一个评测集合。

Welcome to the FlagEval-VLM Leaderboard! The FlagEval-VLM Leaderboard is designed to track, rank and evaluate open Visual Large Language Models (VLMs). This leaderboard is powered by the FlagEval platform, which provides the appropriate arithmetic and runtime environment. VLM builds a dataset-based competency system. Based on the accessed open source datasets, we summarize six competency dimensions, including Mathematical, Visual, Graphical, Generic, Textual, and Chinese, to form a collection of assessments.

Select columns to show
Model types
Precision
Model sizes (in billions of parameters)
T
Model
Average ⬆️
CMMMU
MMMU
OCRBench
MMMU_Pro_standard
MMMU_Pro_vision
MathVision
CII-Bench
Blink
model_name_for_query
🟢
60.35
61.11
62.89
86.19
38.61
40.54
53.79
67.72
64.11
XGen-MM-Instruct-Interleave-v1.5