😻 LLaVA-ICL (Towards Multimodal In-Context Learning for Vision & Language Models)

Towards Multimodal In-Context Learning for Vision & Language Models paper page: https://arxiv.org/abs/2403.12736

model in https://huggingface.co/sivand/LLAVA-ICL/tree/main/icl-llava-pad
Relies heavily on LLaVA. Start by having a working LLaVA environment.
After you have a working llava env, clone this repository:

git clone [email protected]:SivanDoveh/LLaVA.git
cd LLAVA-ICL

Data Preparation

Directory Structure

This project requires specific datasets to be organized in a particular directory structure. Please follow these steps:

Download the following datasets:
- CUB-200-2011
- Oxford 102 Flower
- Stanford Dogs Dataset
- Food-101
- Stanford Cars
Create and organize the project directory as shown below:

LLaVA-ICL/
├── ALL LLaVA files and folders/
│   └── ...
├── FS_pkls/
│   ├── CUB_2way_1shot_episodes.pkl
│   └── ...
├── data/
│   ├── CUB/
│   │   └── CUB_200_2011/
│   │       └── images/
│   │           ├── 17.Clay_colored_Sparrow/
│   │           └── ...
│   ├── flowers/
│   │   └── jpg/
│   ├── stanford_dogs/
│   │   └── Images/
│   │       ├── n02097298-Scotch_terrier/
│   │       └── ...
│   ├── food_101/
│   │   └── images/
│   │       ├── caesar_salad/
│   │       └── ...
│   └── stanford_cars/
│       └── images/
└── ...

Place each dataset in its corresponding directory under data/:

CUB-200-2011 → data/CUB/CUB_200_2011/images/
Oxford 102 Flower → data/flowers/jpg/
Stanford Dogs → data/stanford_dogs/Images/
Food-101 → data/food_101/images/
Stanford Cars → data/stanford_cars/images/

Few Shot Task Files

Important: Download the FS_pkls from here

These pkl files contain TEST Few Shot tasks built on all 5 datasets in a format that ICL_model_vqa_FS.py knows to process for evaluation.

Training data

training_data_mix folder contains LLaVA style training data (mix of multiple choices, question answering, and captioning tasks built from VL checklist and SEED Bench(1-4)

Few Shot Classification Evaluations on our FS-ICL data

To evaluate our LLaVa-ICL model on a single FS data(episode path= the path for that FS JSON you should have downloaded from the drive), you can use this line:

python llava/eval/ICL_model_vqa_FS.py --question_prompt '{question_prompts}' \
--episodes_path {path to FS single dataset (CUB/flowers/cars/...)} \
--model-path {model_path} --output_file 'output_file_name.json'

EXAMPLE for running evaluation on our FS-ICL CUB dataset:

python llava/eval/ICL_model_vqa_FS.py --question_prompt 'What is the type of the bird in the image?' \
--episodes_path './FS_pkls/CUB_2way_1shot_episodes.pkl' \
--model-path path/to/model/folder/train_llava_icl_mix_llava_seed_Vl_ALL_QA_MC_NEW_Cap --output_file 'out.json'

questions prompt used for FS-ICL classification evaluation:

question_prompts=["What is the breed of the dog in the image?","What is the type of the bird in the image?" \
,"What is the type of the flower in the image?","What is the type of the food in the image?", \
"What is the model of the car in the image?"]

Few Shot ICL Classification Evaluations on YOUR data

Prepare a list of dictionaries in this format

{'test_image': 'path/to/image/query_image.jpg', 'test_class': 'class of test image- same as positive example class', 'positive_images': ['path/to/positive class image'], 'negs': [{'neg_images': ['path/to/negative class image'], 'neg_class': 'class of negative image'}]}]

How does the prompt need to look before getting into LLaVA-ICL?
- The FS dataset and the data loader in ICL_model_vqa_FS will convert your pickle data file to look like this conversation: (Insert the images as a list of 3 images)

chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image>
What is the type of the flower in the image?
A. pink-yellow dahlia
B. balloon flower
 Answer with the option's letter from the given choices directly. ASSISTANT: A</s>USER: <image>
What is the type of the flower in the image?
A. pink-yellow dahlia
B. balloon flower
 Answer with the option's letter from the given choices directly. ASSISTANT: B</s>USER: <image>
What is the type of the flower in the image?
A. pink-yellow dahlia
B. balloon flower
 Answer with the option's letter from the given choices directly. ASSISTANT:

Name		Name	Last commit message	Last commit date
Latest commit History 478 Commits
.devcontainer		.devcontainer
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
docs		docs
images		images
llava		llava
playground/data		playground/data
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

😻 LLaVA-ICL (Towards Multimodal In-Context Learning for Vision & Language Models)

Data Preparation

Directory Structure

Few Shot Task Files

Training data

Few Shot Classification Evaluations on our FS-ICL data

questions prompt used for FS-ICL classification evaluation:

Few Shot ICL Classification Evaluations on YOUR data

About

Uh oh!

Releases

Packages

Languages

License

SivanDoveh/LLaVA-ICL

Folders and files

Latest commit

History

Repository files navigation

😻 LLaVA-ICL (Towards Multimodal In-Context Learning for Vision & Language Models)

Data Preparation

Directory Structure

Few Shot Task Files

Training data

Few Shot Classification Evaluations on our FS-ICL data

questions prompt used for FS-ICL classification evaluation:

Few Shot ICL Classification Evaluations on YOUR data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages