WORLD-TO-IMAGE: GROUNDING TEXT-TO-IMAGE GENERATION WITH AGENT-DRIVEN WORLD KNOWLEDGE

This is the official repository of "WORLD-TO-IMAGE: GROUNDING TEXT-TO-IMAGE GENERATION WITH AGENT-DRIVEN WORLD KNOWLEDGE"

WORLD-TO-IMAGE: GROUNDING TEXT-TO-IMAGE GENERATION WITH AGENT-DRIVEN WORLD KNOWLEDGE
Moo Hyun (Kyle) Son¹, Jintaek Oh¹, Sun Bin Mun², Jaechul Roh³, Sehyun Choi⁴
¹The Hong Kong University of Science and Technology (HKUST) ²Georgia Institute of Technology ³University of Massachusetts Amherst ⁴Twelve Labs

Abstract. While text-to-image (T2I) models can synthesize high-quality images, their performance degrades significantly when prompted with novel or out-of-distribution (OOD) entities due to inherent knowledge cutoffs. We introduce World-To-Image, a novel framework that bridges this gap by empowering T2I generation with agent-driven world knowledge. We design an agent that dynamically searches the web to retrieve images for concepts unknown to the base model. This information is then used to perform multimodal prompt optimization, steering powerful generative backbones toward an accurate synthesis. Critically, our evaluation goes beyond traditional metrics, utilizing modern assessments like LLMGrader and ImageReward to measure true semantic fidelity. Our experiments show that World-To-Image substantially outperforms state-of-the-art methods in both semantic alignment and visual aesthetics, achieving +8.1% improvement in accuracy-to-prompt on our curated NICE benchmark. Our framework achieves these results with high efficiency in less than three iterations, paving the way for T2I systems that can better reflect the ever-changing real world.

Paper · Code

Component Details

Prompt Optimizer: LLM-based prompt optimization
Image Retriever: LLM-based image retrieval
Scorer: Scoring of the generated image
Orchestrator: Orchestrates the entire optimization workflow
Pipeline: Orchestrates the entire optimization workflow

Dataset

You may access the NICE Benchmark dataset through the following code:

from datasets import load_dataset  
dataset = load_dataset("mhsonkyle/NICE")

Installation

Prerequisites

Python 3.10

Install with uv (Recommended)

# Clone the repository
git clone https://github.com/mhson-kyle/World-To-Image.git
cd World-To-Image

# Install with uv
uv sync

# Activate virtual environment
source .venv/bin/activate

Configuration

OmniGen2

To use OmniGen2, please follow the instructions in https://github.com/VectorSpaceLab/OmniGen2/ to install OmniGen2.

Environment Variables

# Azure OpenAI
export AZURE_API_KEY="your-azure-api-key"
export AZURE_API_BASE="https://your-endpoint.openai.azure.com/"
export AZURE_API_VERSION="2024-12-01-preview"
export RAPIDAPI_KEY="your-rapidapi-key"

Quick Start

Basic Optimization

# Single prompt optimization
python run_single.py 'dr strange' --iterations 3

# Multiple prompts optimization
python run.py \
  --config configs/config_base.yaml \

Citation

If you find this repository useful, please consider citing:

@article{Son2025World2Image,
  title={WORLD-TO-IMAGE: Grounding Text-to-image Generation with Agent-driven World Knowledge},
  author={Son, Moo Hyun and Oh Jintaek and Mun, Sun Bin and Roh, Jaechul and Choi, Sehyun},
  archivePrefix={arXiv},
  eprint={2510.04201},
  year={2025},
  url={http://arxiv.org/abs/2510.04201},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
OmniGen2		OmniGen2
assets		assets
configs		configs
prompts		prompts
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run.py		run.py
run_single.py		run_single.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WORLD-TO-IMAGE: GROUNDING TEXT-TO-IMAGE GENERATION WITH AGENT-DRIVEN WORLD KNOWLEDGE

Component Details

Dataset

Installation

Prerequisites

Install with uv (Recommended)

Configuration

OmniGen2

Environment Variables

Quick Start

Basic Optimization

Citation

About

Uh oh!

Releases

Packages

Languages

mhson-kyle/World-To-Image

Folders and files

Latest commit

History

Repository files navigation

WORLD-TO-IMAGE: GROUNDING TEXT-TO-IMAGE GENERATION WITH AGENT-DRIVEN WORLD KNOWLEDGE

Component Details

Dataset

Installation

Prerequisites

Install with uv (Recommended)

Configuration

OmniGen2

Environment Variables

Quick Start

Basic Optimization

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages