You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-1Lines changed: 19 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ export BUILD_MONAI=1" \
21
21
22
22
## Data Preparation
23
23
24
-
Download the datasets (MIMIC-CXR, CT-RATE, etc.) and extract them to `data/origin/<data type>/<dataset name>`, where `<data type>` can be `local` for image datasets with localization (bounding boxes, segmentation) annotations and `vision-language` for VQA and radiology report datasets.
24
+
Download the datasets (MIMIC-CXR, CT-RATE, etc.) and extract them to `data/origin/<data type>/<dataset name>`, where `<data type>` can be `local` for image datasets with localized annotations (bounding boxes, segmentation) and `vision-language` for VQA and radiology report datasets.
25
25
26
26
Then execute pre-processing scripts for each dataset. For instance, for MIMIC-CXR, execute the script at `scripts/data/vl/MIMIC-CXR/MIMIC-CXR.py` to pre-process the data. After the pre-processing is finished, the pre-processed data are placed at `data/processed/vision-language/MIMIC-CXR`, where `<split>.json` specifies the data items for each split.
We demonstrate how to follow our proposed pipeline to construct _visually grounded reports_, i.e., textual reports accompanied localized annotations. Results are saved under `data/processed/visual-grounding`.
Instruct an LLM (Meta Llama 3 70B in our case) to identify key phrases in the report text that correspond to anatomical structures or abnormality findings on images. Then, we need to instruct the LLM to filter only positive targets from the output of the last step.
50
+
51
+
These two steps can be completed by executing the script at `scripts/data/vg/tag.py`.
52
+
53
+
### Localized Annotations Generation
54
+
55
+
After the phrases to be grounded are identified from the report text, use pre-trained models to generate corresponding pseudo labels.
56
+
57
+
For CT-RATE, execute `scripts/data/vg/CT-RATE/sat/inference.py` to generate segmentation masks of anatomical structures, using the pre-trained [SAT](https://github.com/zhaoziheng/SAT) model.
58
+
59
+
For MIMIC-CXR, we train a DINO model for disease detection and use the bounding boxes generated by this model with the [detrex](https://github.com/IDEA-Research/detrex) framework on [VinDr-CXR](https://vindr.ai/datasets/cxr) dataset. The inference script is at `scripts/data/vg/MIMIC-CXR/detrex/tools/MIMIC-CXR-vg/infer.py`.
0 commit comments