PRISM Benchmark & FLUX-Reason-6M Dataset

[🌐 Homepage] [🤗 Huggingface Dataset] [📊 Leaderboard ] [📊 Leaderboard-ZH ] [📖 Paper]

Rongyao Fang^1* Aldrich Yu^1* Chengqi Duan^2* Linjiang Huang³ Shuai Bai⁴

Yuxuan Cai⁴ Kun Wang⁵ Si Liu³ Xihui Liu^2† Hongsheng Li^1†

¹CUHK ²HKU ³BUAA ⁴Alibaba ⁵Sensetime

^*Equal Contribution ^†Corresponding Author

📖 Introduction

🌟 This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark", which contains both evaluation code and data for the PRISM Benchmark.

We introduce FLUX-Reason-6M and PRISM-Bench. FLUX-Reason-6M is a 6-million-scale synthesized dataset designed to incorporate reasoning capabilities into the architecture of T2I generation. PRISM-Bench serves as a comprehensive and discriminative benchmark with 7 independent tracks that closely align with human judgment.

💥 News

[2024-09-12] Our paper is now accessible at ArXiv Paper.
[2025-09-12] Our FLUX-Reason-6M dataset is now accessible at huggingface.

📈 Evaluation

Data

Please organize the image data as follows.

└── images
│   ├── imagination
│   │   ├── 0.png
│   │   ├── 1.png
│   │   ├── ...
│   │   ├── 99.png
│   ├── entity
│   ├── text_rendering
│   ├── style
│   ├── affection
│   ├── composition
│   ├── long_text

PRISM-Bench Evaluation

Eval with GPT4.1:

python evaluation/eval_gpt41.py --image_path <path to image data> --api_key <OpenAI API key> --base_url <OpenAI base URL for custom or proxy endpoints>

Eval with Qwen2.5-VL-72B:

python evaluation/eval_qwen25.py --image_path <path to image data> --model_path <path to qwen model> --output_dir <path to save results>

PRISM-Bench-ZH Evaluation

Eval with GPT4.1:

python evaluation/eval_gpt41.py --image_path <path to image data> --api_key <OpenAI API key> --base_url <OpenAI base URL for custom or proxy endpoints> --zh

Eval with Qwen2.5-VL-72B:

python evaluation/eval_qwen25.py --image_path <path to image data> --model_path <path to qwen model> --output_dir <path to save results> --zh

📊 Benchmark

The leaderboard is available here.

PRISM-Bench(GPT4.1)

#	Model	Source	Date	Overall (Align)	Overall (Aes)	Overall (Avg)	Imagination (Align)	Imagination (Aes)	Imagination (Avg)	Entity (Align)	Entity (Aes)	Entity (Avg)	Text rendering (Align)	Text rendering (Aes)	Text rendering (Avg)	Style (Align)	Style (Aes)	Style (Avg)	Affection (Align)	Affection (Aes)	Affection (Avg)	Composition (Align)	Composition (Aes)	Composition (Avg)	Long text (Align)	Long text (Aes)	Long text (Avg)
1	GPT-Image-1 [High] 🥇	Link	2025-09-10	86.9	85.6	86.3	86.2	86.6	86.4	90.0	86.3	88.2	68.8	80.1	74.5	92.8	93.3	93.1	90.7	90.9	90.8	96.2	89.4	92.8	83.8	72.8	78.3
2	Gemini2.5-Flash-Image 🥈	Link	2025-09-10	87.1	83.4	85.3	92.4	84.8	88.6	87.0	81.3	84.2	65.2	74.1	69.7	90.5	90.8	90.7	96.0	88.2	92.1	92.5	88.5	90.5	85.9	76.2	81.1
3	Qwen-Image 🥉	Link	2025-09-10	81.1	78.6	79.9	80.5	78.6	79.6	79.3	73.2	76.3	54.3	68.9	61.6	84.5	88.7	86.6	91.6	89.1	90.4	93.7	86.9	90.3	83.8	65.1	74.5
4	SEEDream 3.0	Link	2025-09-10	80.5	78.7	79.6	77.3	76.4	76.9	80.2	73.8	77.0	56.1	70.2	63.2	83.9	87.4	85.7	89.3	90.3	89.8	93.3	86.3	89.8	83.2	66.7	75.0
5	HiDream-I1-Full	Link	2025-09-10	76.1	75.6	75.9	74.4	75.6	75.0	74.4	72.4	73.4	58.2	70.4	64.3	81.4	84.8	83.1	90.1	88.8	89.5	90.1	85.4	87.8	63.8	52.0	57.9
6	FLUX.1-Krea-dev	Link	2025-09-10	74.3	75.1	74.7	71.5	73.0	72.3	69.5	67.5	68.5	47.5	61.3	54.4	80.8	83.5	82.2	84.0	90.3	87.2	90.9	85.8	88.4	76.2	64.1	70.2
7	FLUX.1-dev	Link	2025-09-10	72.4	74.9	73.7	68.1	74.0	71.1	70.7	71.2	71.0	48.1	64.5	56.3	72.3	80.5	76.4	88.3	91.1	89.7	89.0	84.6	86.8	70.6	58.5	64.6
8	SD3.5-Large	Link	2025-09-10	73.9	73.5	73.7	73.3	71.2	72.3	76.7	71.9	74.3	52.0	65.8	58.9	77.1	84.2	80.7	87.1	85.2	86.2	87.0	84.7	85.9	64.3	51.7	58.0
9	HiDream-I1-Dev	Link	2025-09-10	70.3	70.0	70.2	68.2	69.7	69.0	72.0	67.0	69.5	53.4	64.1	58.8	68.7	78.6	73.7	84.2	83.1	83.7	87.6	79.8	83.7	58.1	47.5	52.8
10	SD3.5-Medium	Link	2025-09-10	70.1	68.9	69.5	69.5	73.0	71.3	72.8	63.7	68.3	33.3	50.1	41.7	77.4	80.3	78.9	84.9	85.5	85.2	89.4	79.2	84.3	63.3	50.5	56.9
11	SD3-Medium	Link	2025-09-10	65.6	65.2	65.4	61.0	65.6	63.3	64.8	56.3	60.6	32.8	53.1	43.0	74.8	75.6	75.2	78.7	80.3	79.5	85.5	79.1	82.3	61.5	46.1	53.8
12	Bagel-CoT	Link	2025-09-10	65.4	65.0	65.2	68.4	74.2	71.3	62.4	60.0	61.2	23.2	40.1	31.7	64.4	70.1	67.3	87.1	80.5	83.8	88.5	77.9	83.2	64.0	52.0	58.0
13	Bagel	Link	2025-09-10	66.7	63.4	65.1	69.4	68.0	68.7	59.0	50.1	54.6	30.2	44.5	37.4	67.9	71.3	69.6	81.7	81.4	81.6	90.5	73.1	81.8	68.1	55.3	61.7
14	FLUX.1-schnell	Link	2025-09-10	67.1	61.2	64.2	63.3	66.2	64.8	61.8	51.2	56.5	46.2	54.1	50.2	68.6	70.1	69.4	75.4	69.9	72.7	85.1	67.5	76.3	69.4	49.7	59.6
15	Playground	Link	2025-09-10	62.6	65.6	64.1	62.3	70.6	66.5	72.5	69.1	70.8	10.4	37.3	23.9	77.3	80.9	79.1	91.8	83.8	87.8	77.5	76.5	77.0	46.7	41.0	43.9
16	JanusPro-7B	Link	2025-09-10	64.2	57.2	60.7	70.4	65.8	68.1	67.1	51.9	59.5	15.5	36.7	26.1	71.4	73.8	72.6	79.2	71.5	75.4	83.7	61.0	72.4	62.4	39.7	51.1
17	SDXL	Link	2025-09-10	58.9	61.8	60.4	55.3	61.1	58.2	72.5	67.4	70.0	13.8	37.0	25.4	72.4	75.4	73.9	78.9	77.1	78.0	75.5	75.3	75.4	44.2	39.6	41.9
18	SD2.1	Link	2025-09-10	50.7	45.3	48.0	47.9	41.2	44.6	60.9	46.7	53.8	11.2	30.6	20.9	62.7	58.6	60.7	66.7	58.5	62.6	65.7	53.1	59.4	40.1	28.2	34.2
19	SD1.5	Link	2025-09-10	44.9	43.5	44.2	36.6	36.1	36.4	53.8	41.1	47.5	8.0	33.1	20.6	55.3	55.3	55.3	64.4	57.5	61.0	61.1	51.0	56.1	35.3	30.4	32.9

PRISM-Bench (Qwen2.5-VL)

#	Model	Source	Date	Overall (Align)	Overall (Aes)	Overall (Avg)	Imagination (Align)	Imagination (Aes)	Imagination (Avg)	Entity (Align)	Entity (Aes)	Entity (Avg)	Text rendering (Align)	Text rendering (Aes)	Text rendering (Avg)	Style (Align)	Style (Aes)	Style (Avg)	Affection (Align)	Affection (Aes)	Affection (Avg)	Composition (Align)	Composition (Aes)	Composition (Avg)	Long text (Align)	Long text (Aes)	Long text (Avg)
1	GPT-Image-1 [High] 🥇	Link	2025-09-10	82.7	78.7	80.7	79.8	53.3	66.6	87.3	81.0	84.1	66.7	86.8	76.8	87.3	87.8	87.5	88.1	79.8	84.0	92.2	84.9	88.5	77.2	77.5	77.4
2	Gemini2.5-Flash-Image 🥈	Link	2025-09-10	85.0	75.8	80.4	84.7	38.1	61.4	86.0	76.7	81.3	72.8	84.3	78.5	89.5	87.8	88.6	94.3	74.8	84.5	91.2	88.2	89.7	76.3	80.6	78.4
3	SEEDream 3.0 🥉	Link	2025-09-10	80.1	72.3	76.2	75.8	38.0	56.9	81.3	74.2	77.7	58.8	74.0	66.4	84.4	84.1	84.2	90.5	74.6	82.5	93.6	85.1	89.3	76.2	76.4	76.3
4	Qwen-Image	Link	2025-09-10	80.0	68.3	74.1	75.5	37.4	56.5	79.5	64.5	72.0	57.9	71.2	64.5	86.6	84.4	85.5	89.9	70.4	80.1	93.9	79.5	86.7	76.8	70.9	73.8
5	FLUX.1-Krea-dev	Link	2025-09-10	74.4	73.7	74.0	69.6	43.1	56.3	72.2	70.7	71.4	51.7	76.1	63.9	80.0	86.6	83.3	82.6	78.7	80.6	90.8	87.1	88.9	73.6	73.4	73.5
6	HiDream-I1-Full	Link	2025-09-10	76.6	68.6	72.6	73.0	44.0	58.5	76.3	72.8	74.5	60.5	76.4	68.4	81.4	81.5	81.4	90.0	76.6	83.3	88.5	80.3	84.4	66.3	48.6	57.4
7	SD3.5-Large	Link	2025-09-10	73.4	67.8	70.6	66.7	43.4	55.0	76.8	72.7	74.8	53.6	73.1	63.3	77.3	78.2	77.7	85.6	73.9	79.7	87.8	80.9	84.3	65.8	52.2	59.0
8	HiDream-I1-Dev	Link	2025-09-10	72.3	67.0	69.6	68.8	45.8	57.3	73.5	68.1	70.8	56.7	75.7	66.2	70.2	77.4	73.8	88.2	74.3	81.2	84.7	78.5	81.6	64.0	49.3	56.6
9	FLUX.1-dev	Link	2025-09-10	72.1	64.9	68.5	65.5	42.9	54.2	70.6	61.9	66.2	52.3	73.0	62.6	72.6	74.2	73.4	86.0	72.9	79.4	87.4	75.8	81.6	70.5	53.8	62.1
10	SD3.5-Medium	Link	2025-09-10	68.6	65.1	66.8	65.1	34.7	49.9	72.5	70.9	71.7	36.6	64.5	50.5	75.5	80.0	77.7	81.8	73.9	77.9	85.4	81.0	83.2	63.5	50.6	57.0
11	SD3-Medium	Link	2025-09-10	68.0	64.2	66.1	64.3	37.7	51.0	69.4	63.3	66.3	38.5	63.3	50.9	74.6	79.5	77.0	80.5	75.5	78.0	85.6	79.5	82.5	63.4	50.3	56.8
12	FLUX.1-schnell	Link	2025-09-10	68.3	61.1	64.7	62.8	35.6	49.2	64.8	56.8	60.8	54.3	68.1	61.2	70.3	71.5	70.9	75.4	65.9	70.6	81.7	75.6	78.6	68.7	54.4	61.5
13	JanusPro-7B	Link	2025-09-10	64.9	59.4	62.1	65.0	38.8	51.9	68.6	63.5	66.0	23.1	50.3	36.7	70.7	75.2	72.9	80.7	68.0	74.3	82.4	71.1	76.7	63.9	49.0	56.4
14	Bagel-CoT	Link	2025-09-10	67.5	56.5	62.0	68.0	44.1	56.0	67.6	53.4	60.5	29.4	42.3	35.8	69.0	69.7	69.3	87.1	66.7	76.9	86.6	69.2	77.9	64.5	50.2	57.3
15	Bagel	Link	2025-09-10	67.5	56.6	62.0	68.0	45.0	56.5	67.6	53.4	60.5	29.4	42.3	35.8	69.0	69.7	69.3	87.1	66.7	76.9	86.6	69.2	77.9	64.5	50.2	57.3
16	Playground	Link	2025-09-10	62.2	52.1	57.1	59.0	39.0	49.0	69.4	56.7	63.0	15.3	31.9	23.6	74.6	74.6	74.6	88.8	66.0	77.4	72.2	61.3	66.7	56.0	35.3	45.6
17	SDXL	Link	2025-09-10	60.1	54.0	57.0	54.5	34.1	44.3	71.1	65.0	68.0	18.6	37.3	27.9	71.7	72.6	72.1	78.7	66.5	72.6	72.2	67.8	70.0	54.1	34.5	44.3
18	SD2.1	Link	2025-09-10	54.0	47.7	50.8	48.9	28.4	38.6	66.0	57.6	61.8	16.7	31.4	24.0	62.7	66.5	64.6	68.5	62.1	65.3	64.8	58.3	61.5	50.7	29.8	40.2
19	SD1.5	Link	2025-09-10	48.8	43.3	46.0	40.7	23.7	32.2	61.2	52.7	56.9	11.4	24.1	17.8	56.7	61.5	59.1	66.9	60.7	63.8	57.5	53.4	55.4	47.3	26.8	37.0

PRISM-Bench-ZH (GPT4.1)

#	Model	Source	Date	Overall (Align)	Overall (Aes)	Overall (Avg)	Imagination (Align)	Imagination (Aes)	Imagination (Avg)	Entity (Align)	Entity (Aes)	Entity (Avg)	Text rendering (Align)	Text rendering (Aes)	Text rendering (Avg)	Style (Align)	Style (Aes)	Style (Avg)	Affection (Align)	Affection (Aes)	Affection (Avg)	Composition (Align)	Composition (Aes)	Composition (Avg)	Long text (Align)	Long text (Aes)	Long text (Avg)
1	GPT-Image-1 [High] 🥇	Link	2025-09-10	87.7	87.2	87.5	88.8	90.4	89.6	85.9	92.4	89.2	83.9	67.7	75.8	93.9	91.7	92.8	91.5	86.5	89.0	92.4	97.3	94.9	77.2	84.3	80.8
2	SEEDream 3.0 🥈	Link	2025-09-10	81.9	82.0	82.0	77.2	77.8	77.5	77.6	78.6	78.1	79.7	71.9	75.8	87.8	83.2	85.5	88.7	85.1	86.9	87.7	94.4	91.1	74.3	82.7	78.5
3	Qwen-Image 🥉	Link	2025-09-10	80.8	81.3	81.1	80.1	79.6	79.9	75.6	79.7	77.7	76.9	62.9	69.9	90.2	84.3	87.3	87.4	84.9	86.2	86.6	93.4	90.0	68.9	84.2	76.6
4	Bagel	Link	2025-09-10	65.5	65.2	65.4	72.8	64.7	68.8	53.9	62.2	58.1	49.2	29.0	39.1	73.9	68.4	71.2	81.4	73.5	77.5	69.0	89.8	79.4	58.1	68.7	63.4
5	Bagel-CoT	Link	2025-09-10	64.4	62.4	63.4	75.1	69.3	72.2	53.3	58.8	56.1	42.6	16.3	29.5	73.6	66.6	70.1	81.2	78.0	79.6	74.0	83.6	78.8	50.7	64.3	57.5
6	HiDream-I1-Full	Link	2025-09-10	60.8	54.9	57.9	53.6	47.3	50.5	63.1	60.8	62.0	34.6	16.3	25.5	74.1	65.5	69.8	80.9	67.3	74.1	73.8	76.1	75.0	45.4	50.8	48.1
7	HiDream-I1-Dev	Link	2025-09-10	55.0	48.3	51.7	47.3	41.1	44.2	52.8	49.0	50.9	35.2	14.5	24.9	64.5	52.4	58.5	76.3	66.5	71.4	67.6	68.3	68.0	41.1	46.4	43.8

PRISM-Bench-ZH (Qwen2.5-VL)

#	Model	Source	Date	Overall (Align)	Overall (Aes)	Overall (Avg)	Imagination (Align)	Imagination (Aes)	Imagination (Avg)	Entity (Align)	Entity (Aes)	Entity (Avg)	Text rendering (Align)	Text rendering (Aes)	Text rendering (Avg)	Style (Align)	Style (Aes)	Style (Avg)	Affection (Align)	Affection (Aes)	Affection (Avg)	Composition (Align)	Composition (Aes)	Composition (Avg)	Long text (Align)	Long text (Aes)	Long text (Avg)
1	GPT-Image-1 [High] 🥇	Link	2025-09-10	78.0	77.4	77.7	73.0	37.6	55.3	80.4	82.1	81.3	73.1	89.9	81.5	77.1	92.4	84.8	78.0	77.8	77.9	91.9	85.7	88.8	72.4	76.3	74.4
2	SEEDream 3.0 🥈	Link	2025-09-10	76.2	73.2	74.7	71.4	36.6	54.0	74.8	73.8	74.3	70.7	88.0	79.4	74.1	88.0	81.1	79.0	71.4	75.2	90.3	83.2	86.8	73.0	71.2	72.1
3	Qwen-Image 🥉	Link	2025-09-10	75.0	65.5	70.3	71.4	29.9	50.7	74.7	67.8	71.3	64.3	73.1	68.7	75.2	83.2	79.2	77.3	64.5	70.9	89.8	74.1	82.0	72.6	65.8	69.2
4	Bagel-CoT	Link	2025-09-10	62.0	57.4	59.7	64.4	36.6	50.5	62.6	53.8	58.2	25.2	51.9	38.6	65.4	76.7	71.1	74.0	65.0	69.5	81.3	71.3	76.3	61.4	46.6	54.0
5	Bagel	Link	2025-09-10	61.5	54.3	57.9	64.6	36.3	50.5	62.7	55.5	59.1	18.6	26.3	22.5	66.0	76.6	71.3	74.9	66.2	70.6	81.3	72.2	76.8	62.4	47.3	54.9
6	HiDream-I1-Full	Link	2025-09-10	55.9	55.3	55.6	51.2	30.8	41.0	60.1	61.3	60.7	20.7	40.6	30.7	64.5	73.8	69.2	65.2	69.1	67.2	72.4	69.0	70.7	57.1	42.8	50.0
7	HiDream-I1-Dev	Link	2025-09-10	52.2	49.7	50.9	48.3	24.6	36.5	52.6	54.1	53.4	18.6	35.3	27.0	59.0	68.3	63.7	65.9	62.3	64.1	66.5	64.6	65.6	54.2	38.6	46.4

📝 Citation

If you find this work helpful, please consider citing:

@article{fang2025flux,
      title={FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark}, 
      author={Fang, Rongyao and Yu, Aldrich and Duan, Chengqi and Huang, Linjiang and Bai, Shuai and Cai, Yuxuan and Wang, Kun and Liu, Si and Liu, Xihui and Li, Hongsheng},
      journal={arXiv preprint arXiv:2509.09680},
      year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
captions		captions
evaluation		evaluation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PRISM Benchmark & FLUX-Reason-6M Dataset

📖 Introduction

💥 News

📈 Evaluation

Data

PRISM-Bench Evaluation

Eval with GPT4.1:

Eval with Qwen2.5-VL-72B:

PRISM-Bench-ZH Evaluation

Eval with GPT4.1:

Eval with Qwen2.5-VL-72B:

📊 Benchmark

📝 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

rongyaofang/prism-bench

Folders and files

Latest commit

History

Repository files navigation

PRISM Benchmark & FLUX-Reason-6M Dataset

📖 Introduction

💥 News

📈 Evaluation

Data

PRISM-Bench Evaluation

Eval with GPT4.1:

Eval with Qwen2.5-VL-72B:

PRISM-Bench-ZH Evaluation

Eval with GPT4.1:

Eval with Qwen2.5-VL-72B:

📊 Benchmark

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages