Skip to content

Commit 0a2d15d

Browse files
committed
add guide on grounded reports construction
1 parent 27534e2 commit 0a2d15d

File tree

1 file changed

+19
-1
lines changed

1 file changed

+19
-1
lines changed

README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ export BUILD_MONAI=1" \
2121

2222
## Data Preparation
2323

24-
Download the datasets (MIMIC-CXR, CT-RATE, etc.) and extract them to `data/origin/<data type>/<dataset name>`, where `<data type>` can be `local` for image datasets with localization (bounding boxes, segmentation) annotations and `vision-language` for VQA and radiology report datasets.
24+
Download the datasets (MIMIC-CXR, CT-RATE, etc.) and extract them to `data/origin/<data type>/<dataset name>`, where `<data type>` can be `local` for image datasets with localized annotations (bounding boxes, segmentation) and `vision-language` for VQA and radiology report datasets.
2525

2626
Then execute pre-processing scripts for each dataset. For instance, for MIMIC-CXR, execute the script at `scripts/data/vl/MIMIC-CXR/MIMIC-CXR.py` to pre-process the data. After the pre-processing is finished, the pre-processed data are placed at `data/processed/vision-language/MIMIC-CXR`, where `<split>.json` specifies the data items for each split.
2727

@@ -39,3 +39,21 @@ python scripts/cli.py fit -c conf/phase-vlm/fit.yaml --compile false --data.data
3939
# Stage 3: Alignment (grounded report generate)
4040
python scripts/cli.py fit -c conf/phase-grg/fit.yaml --compile false --data.dataloader.train_batch_size ... --trainer.accumulate_grad_batches ... --seed_everything $RANDOM --model.freeze_sam false --model.freeze_isam false
4141
```
42+
43+
## Grounded Reports Construction
44+
45+
We demonstrate how to follow our proposed pipeline to construct _visually grounded reports_, i.e., textual reports accompanied localized annotations. Results are saved under `data/processed/visual-grounding`.
46+
47+
### Key Phrases Identification & Positive Targets Filtering
48+
49+
Instruct an LLM (Meta Llama 3 70B in our case) to identify key phrases in the report text that correspond to anatomical structures or abnormality findings on images. Then, we need to instruct the LLM to filter only positive targets from the output of the last step.
50+
51+
These two steps can be completed by executing the script at `scripts/data/vg/tag.py`.
52+
53+
### Localized Annotations Generation
54+
55+
After the phrases to be grounded are identified from the report text, use pre-trained models to generate corresponding pseudo labels.
56+
57+
For CT-RATE, execute `scripts/data/vg/CT-RATE/sat/inference.py` to generate segmentation masks of anatomical structures, using the pre-trained [SAT](https://github.com/zhaoziheng/SAT) model.
58+
59+
For MIMIC-CXR, we train a DINO model for disease detection and use the bounding boxes generated by this model with the [detrex](https://github.com/IDEA-Research/detrex) framework on [VinDr-CXR](https://vindr.ai/datasets/cxr) dataset. The inference script is at `scripts/data/vg/MIMIC-CXR/detrex/tools/MIMIC-CXR-vg/infer.py`.

0 commit comments

Comments
 (0)