add guide on grounded reports construction

function2-llx · function2-llx · commit 0a2d15d49c30 · 2025-03-10T20:58:42.000+08:00
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ export BUILD_MONAI=1" \
 
 ## Data Preparation
 
-Download the datasets (MIMIC-CXR, CT-RATE, etc.) and extract them to `data/origin/<data type>/<dataset name>`, where `<data type>` can be `local` for image datasets with localization (bounding boxes, segmentation) annotations and `vision-language` for VQA and radiology report datasets.
+Download the datasets (MIMIC-CXR, CT-RATE, etc.) and extract them to `data/origin/<data type>/<dataset name>`, where `<data type>` can be `local` for image datasets with localized annotations (bounding boxes, segmentation) and `vision-language` for VQA and radiology report datasets.
 
 Then execute pre-processing scripts for each dataset. For instance, for MIMIC-CXR, execute the script at `scripts/data/vl/MIMIC-CXR/MIMIC-CXR.py` to pre-process the data. After the pre-processing is finished, the pre-processed data are placed at `data/processed/vision-language/MIMIC-CXR`, where `<split>.json` specifies the data items for each split.
 
@@ -39,3 +39,21 @@ python scripts/cli.py fit -c conf/phase-vlm/fit.yaml --compile false --data.data
 # Stage 3: Alignment (grounded report generate)
 python scripts/cli.py fit -c conf/phase-grg/fit.yaml --compile false --data.dataloader.train_batch_size ... --trainer.accumulate_grad_batches ... --seed_everything $RANDOM --model.freeze_sam false --model.freeze_isam false
 ```
+
+## Grounded Reports Construction
+
+We demonstrate how to follow our proposed pipeline to construct _visually grounded reports_, i.e., textual reports accompanied localized annotations. Results are saved under `data/processed/visual-grounding`. 
+
+### Key Phrases Identification & Positive Targets Filtering
+
+Instruct an LLM (Meta Llama 3 70B in our case) to identify key phrases in the report text that correspond to anatomical structures or abnormality findings on images. Then, we need to instruct the LLM to filter only positive targets from the output of the last step. 
+
+These two steps can be completed by executing the script at `scripts/data/vg/tag.py`.
+
+### Localized Annotations Generation
+
+After the phrases to be grounded are identified from the report text, use pre-trained models to generate corresponding pseudo labels.
+
+For CT-RATE, execute `scripts/data/vg/CT-RATE/sat/inference.py` to generate segmentation masks of anatomical structures, using the pre-trained [SAT](https://github.com/zhaoziheng/SAT) model.
+
+For MIMIC-CXR, we train a DINO model for disease detection and use the bounding boxes generated by this model with the [detrex](https://github.com/IDEA-Research/detrex) framework on [VinDr-CXR](https://vindr.ai/datasets/cxr) dataset. The inference script is at `scripts/data/vg/MIMIC-CXR/detrex/tools/MIMIC-CXR-vg/infer.py`.