[Project] Medical semantic seg dataset: Pcam (#2684)

Masaaki-75 · web-flow · commit 30e3b49b0b16 · 2023-06-25T15:44:43.000+08:00
diff --git a/projects/medical/2d_image/histopathology/pcam/README.md b/projects/medical/2d_image/histopathology/pcam/README.md
@@ -0,0 +1,153 @@
+# PCam (PatchCamelyon)
+
+## Description
+
+This project supports **`Patch Camelyon (PCam) `**, which can be downloaded from [here](https://opendatalab.com/PCam).
+
+### Dataset Overview
+
+PatchCamelyon is an image classification dataset. It consists of 327680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating presence of metastatic tissue. PCam provides a new benchmark for machine learning models: bigger than CIFAR10, smaller than ImageNet, trainable on a single GPU.
+
+### Statistic Information
+
+| Dataset Name                         | Anatomical Region | Task Type    | Modality       | Num. Classes | Train/Val/Test images | Train/Val/Test Labeled | Release Date | License                                                       |
+| ------------------------------------ | ----------------- | ------------ | -------------- | ------------ | --------------------- | ---------------------- | ------------ | ------------------------------------------------------------- |
+| [Pcam](https://opendatalab.com/PCam) | throax            | segmentation | histopathology | 2            | 327680/-/-            | yes/-/-                | 2018         | [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) |
+
+|    Class Name     | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
+| :---------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
+|    background     |   214849   |   63.77    |    -     |    -     |     -     |     -     |
+| metastatic tissue |   131832   |   36.22    |    -     |    -     |     -     |     -     |
+
+Note:
+
+- `Pct` means percentage of pixels in this category in all pixels.
+
+### Visualization
+
+![pcam](https://raw.githubusercontent.com/uni-medical/medical-datasets-visualization/main/2d/semantic_seg/histopathology/pcam/pcam_dataset.png?raw=true)
+
+### Dataset Citation
+
+```
+@inproceedings{veeling2018rotation,
+	title={Rotation equivariant CNNs for digital pathology},
+	author={Veeling, Bastiaan S and Linmans, Jasper and Winkens, Jim and Cohen, Taco and Welling, Max},
+	booktitle={International Conference on Medical image computing and computer-assisted intervention},
+	pages={210--218},
+	year={2018},
+}
+```
+
+### Prerequisites
+
+- Python v3.8
+- PyTorch v1.10.0
+- pillow(PIL) v9.3.0 9.3.0
+- scikit-learn(sklearn) v1.2.0 1.2.0
+- [MIM](https://github.com/open-mmlab/mim) v0.3.4
+- [MMCV](https://github.com/open-mmlab/mmcv) v2.0.0rc4
+- [MMEngine](https://github.com/open-mmlab/mmengine) v0.2.0 or higher
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) v1.0.0rc5
+
+All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `pcam/` root directory, run the following line to add the current directory to `PYTHONPATH`:
+
+```shell
+export PYTHONPATH=`pwd`:$PYTHONPATH
+```
+
+### Dataset Preparing
+
+- download dataset from [here](https://opendatalab.com/PCam) and decompress data to path `'data/'`.
+- run script `"python tools/prepare_dataset.py"` to format data and change folder structure as below.
+- run script `"python ../../tools/split_seg_dataset.py"` to split dataset and generate `train.txt`, `val.txt` and `test.txt`. If the label of official validation set and test set cannot be obtained, we generate `train.txt` and `val.txt` from the training set randomly.
+
+```shell
+mkdir data & cd data
+pip install opendatalab
+odl get PCam
+mv ./PCam/raw/pcamv1 ./
+rm -rf PCam
+cd ..
+python tools/prepare_dataset.py
+python ../../tools/split_seg_dataset.py
+```
+
+```none
+  mmsegmentation
+  ├── mmseg
+  ├── projects
+  │   ├── medical
+  │   │   ├── 2d_image
+  │   │   │   ├── histopathology
+  │   │   │   │   ├── pcam
+  │   │   │   │   │   ├── configs
+  │   │   │   │   │   ├── datasets
+  │   │   │   │   │   ├── tools
+  │   │   │   │   │   ├── data
+  │   │   │   │   │   │   ├── train.txt
+  │   │   │   │   │   │   ├── val.txt
+  │   │   │   │   │   │   ├── images
+  │   │   │   │   │   │   │   ├── train
+  │   │   │   │   |   │   │   │   ├── xxx.png
+  │   │   │   │   |   │   │   │   ├── ...
+  │   │   │   │   |   │   │   │   └── xxx.png
+  │   │   │   │   │   │   ├── masks
+  │   │   │   │   │   │   │   ├── train
+  │   │   │   │   |   │   │   │   ├── xxx.png
+  │   │   │   │   |   │   │   │   ├── ...
+  │   │   │   │   |   │   │   │   └── xxx.png
+```
+
+### Divided Dataset Information
+
+***Note: The table information below is divided by ourselves.***
+
+|    Class Name     | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
+| :---------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
+|    background     |   171948   |   63.82    |  42901   |   63.6   |     -     |     -     |
+| metastatic tissue |   105371   |   36.18    |  26461   |   36.4   |     -     |     -     |
+
+### Training commands
+
+To train models on a single server with one GPU. (default)
+
+```shell
+mim train mmseg ./configs/${CONFIG_FILE}
+```
+
+### Testing commands
+
+To test models on a single server with one GPU. (default)
+
+```shell
+mim test mmseg ./configs/${CONFIG_FILE}  --checkpoint ${CHECKPOINT_PATH}
+```
+
+<!-- List the results as usually done in other model's README. [Example](https://github.com/open-mmlab/mmsegmentation/tree/dev-1.x/configs/fcn#results-and-models)
+
+You should claim whether this is based on the pre-trained weights, which are converted from the official release; or it's a reproduced result obtained from retraining the model in this project. -->
+
+## Checklist
+
+- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
+
+  - [x] Finish the code
+  - [x] Basic docstrings & proper citation
+  - [ ] Test-time correctness
+  - [x] A full README
+
+- [ ] Milestone 2: Indicates a successful model implementation.
+
+  - [ ] Training-time correctness
+
+- [ ] Milestone 3: Good to be a part of our core package!
+
+  - [ ] Type hints and docstrings
+  - [ ] Unit tests
+  - [ ] Code polishing
+  - [ ] Metafile.yml
+
+- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
+
+- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
diff --git a/projects/medical/2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.0001-20k_pcam-512x512.py b/projects/medical/2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.0001-20k_pcam-512x512.py
@@ -0,0 +1,17 @@
+_base_ = [
+    'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py',
+    'mmseg::_base_/default_runtime.py',
+    'mmseg::_base_/schedules/schedule_20k.py'
+]
+custom_imports = dict(imports='datasets.pcam_dataset')
+img_scale = (512, 512)
+data_preprocessor = dict(size=img_scale)
+optimizer = dict(lr=0.0001)
+optim_wrapper = dict(optimizer=optimizer)
+model = dict(
+    data_preprocessor=data_preprocessor,
+    decode_head=dict(num_classes=2),
+    auxiliary_head=None,
+    test_cfg=dict(mode='whole', _delete_=True))
+vis_backends = None
+visualizer = dict(vis_backends=vis_backends)
diff --git a/projects/medical/2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.001-20k_pcam-512x512.py b/projects/medical/2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.001-20k_pcam-512x512.py
@@ -0,0 +1,17 @@
+_base_ = [
+    'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py',
+    'mmseg::_base_/default_runtime.py',
+    'mmseg::_base_/schedules/schedule_20k.py'
+]
+custom_imports = dict(imports='datasets.pcam_dataset')
+img_scale = (512, 512)
+data_preprocessor = dict(size=img_scale)
+optimizer = dict(lr=0.001)
+optim_wrapper = dict(optimizer=optimizer)
+model = dict(
+    data_preprocessor=data_preprocessor,
+    decode_head=dict(num_classes=2),
+    auxiliary_head=None,
+    test_cfg=dict(mode='whole', _delete_=True))
+vis_backends = None
+visualizer = dict(vis_backends=vis_backends)
diff --git a/projects/medical/2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.01-20k_pcam-512x512.py b/projects/medical/2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.01-20k_pcam-512x512.py
@@ -0,0 +1,17 @@
+_base_ = [
+    'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py',
+    'mmseg::_base_/default_runtime.py',
+    'mmseg::_base_/schedules/schedule_20k.py'
+]
+custom_imports = dict(imports='datasets.pcam_dataset')
+img_scale = (512, 512)
+data_preprocessor = dict(size=img_scale)
+optimizer = dict(lr=0.01)
+optim_wrapper = dict(optimizer=optimizer)
+model = dict(
+    data_preprocessor=data_preprocessor,
+    decode_head=dict(num_classes=2),
+    auxiliary_head=None,
+    test_cfg=dict(mode='whole', _delete_=True))
+vis_backends = None
+visualizer = dict(vis_backends=vis_backends)
diff --git a/projects/medical/2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.01lr-sigmoid-20k_pcam-512x512.py b/projects/medical/2d_image/histopathology/pcam/configs/fcn-unet-s5-d16_unet_1xb16-0.01lr-sigmoid-20k_pcam-512x512.py
@@ -0,0 +1,18 @@
+_base_ = [
+    'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py',
+    'mmseg::_base_/default_runtime.py',
+    'mmseg::_base_/schedules/schedule_20k.py'
+]
+custom_imports = dict(imports='datasets.pcam_dataset')
+img_scale = (512, 512)
+data_preprocessor = dict(size=img_scale)
+optimizer = dict(lr=0.01)
+optim_wrapper = dict(optimizer=optimizer)
+model = dict(
+    data_preprocessor=data_preprocessor,
+    decode_head=dict(
+        num_classes=2, loss_decode=dict(use_sigmoid=True), out_channels=1),
+    auxiliary_head=None,
+    test_cfg=dict(mode='whole', _delete_=True))
+vis_backends = None
+visualizer = dict(vis_backends=vis_backends)
diff --git a/projects/medical/2d_image/histopathology/pcam/configs/pcam_512x512.py b/projects/medical/2d_image/histopathology/pcam/configs/pcam_512x512.py
@@ -0,0 +1,42 @@
+dataset_type = 'PCamDataset'
+data_root = 'data/'
+img_scale = (512, 512)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', scale=img_scale, keep_ratio=False),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='PackSegInputs')
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='Resize', scale=img_scale, keep_ratio=False),
+    dict(type='LoadAnnotations'),
+    dict(type='PackSegInputs')
+]
+train_dataloader = dict(
+    batch_size=16,
+    num_workers=4,
+    persistent_workers=True,
+    sampler=dict(type='InfiniteSampler', shuffle=True),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        ann_file='train.txt',
+        data_prefix=dict(img_path='images/', seg_map_path='masks/'),
+        pipeline=train_pipeline))
+val_dataloader = dict(
+    batch_size=1,
+    num_workers=4,
+    persistent_workers=True,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        ann_file='val.txt',
+        data_prefix=dict(img_path='images/', seg_map_path='masks/'),
+        pipeline=test_pipeline))
+test_dataloader = val_dataloader
+val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
+test_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
diff --git a/projects/medical/2d_image/histopathology/pcam/datasets/pcam_dataset.py b/projects/medical/2d_image/histopathology/pcam/datasets/pcam_dataset.py
@@ -0,0 +1,31 @@
+from mmseg.datasets import BaseSegDataset
+from mmseg.registry import DATASETS
+
+
+@DATASETS.register_module()
+class PCamDataset(BaseSegDataset):
+    """PCamDataset dataset.
+
+    In segmentation map annotation for PCamDataset,
+    0 stands for background, which is included in 2 categories.
+    ``reduce_zero_label`` is fixed to False. The ``img_suffix``
+    is fixed to '.png' and ``seg_map_suffix`` is fixed to '.png'.
+
+    Args:
+        img_suffix (str): Suffix of images. Default: '.png'
+        seg_map_suffix (str): Suffix of segmentation maps. Default: '.png'
+        reduce_zero_label (bool): Whether to mark label zero as ignored.
+            Default to False.
+    """
+    METAINFO = dict(classes=('background', 'metastatic tissue'))
+
+    def __init__(self,
+                 img_suffix='.png',
+                 seg_map_suffix='.png',
+                 reduce_zero_label=False,
+                 **kwargs) -> None:
+        super().__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            reduce_zero_label=reduce_zero_label,
+            **kwargs)
diff --git a/projects/medical/2d_image/histopathology/pcam/tools/prepare_dataset.py b/projects/medical/2d_image/histopathology/pcam/tools/prepare_dataset.py
@@ -0,0 +1,49 @@
+import os
+
+import h5py
+import numpy as np
+from PIL import Image
+
+root_path = 'data/'
+
+tgt_img_train_dir = os.path.join(root_path, 'images/train/')
+tgt_mask_train_dir = os.path.join(root_path, 'masks/train/')
+tgt_img_val_dir = os.path.join(root_path, 'images/val/')
+tgt_img_test_dir = os.path.join(root_path, 'images/test/')
+
+os.system('mkdir -p ' + tgt_img_train_dir)
+os.system('mkdir -p ' + tgt_mask_train_dir)
+os.system('mkdir -p ' + tgt_img_val_dir)
+os.system('mkdir -p ' + tgt_img_test_dir)
+
+
+def extract_pics_from_h5(h5_path, h5_key, save_dir):
+    f = h5py.File(h5_path, 'r')
+    for i, img in enumerate(f[h5_key]):
+        img = img.astype(np.uint8).squeeze()
+        img = Image.fromarray(img)
+        save_image_path = os.path.join(save_dir, str(i).zfill(8) + '.png')
+        img.save(save_image_path)
+
+
+if __name__ == '__main__':
+
+    extract_pics_from_h5(
+        'data/pcamv1/camelyonpatch_level_2_split_train_x.h5',
+        h5_key='x',
+        save_dir=tgt_img_train_dir)
+
+    extract_pics_from_h5(
+        'data/pcamv1/camelyonpatch_level_2_split_valid_x.h5',
+        h5_key='x',
+        save_dir=tgt_img_val_dir)
+
+    extract_pics_from_h5(
+        'data/pcamv1/camelyonpatch_level_2_split_test_x.h5',
+        h5_key='x',
+        save_dir=tgt_img_test_dir)
+
+    extract_pics_from_h5(
+        'data/pcamv1/camelyonpatch_level_2_split_train_mask.h5',
+        h5_key='mask',
+        save_dir=tgt_mask_train_dir)
diff --git a/projects/medical/2d_image/microscopy_images/2pm_vessel/README.md b/projects/medical/2d_image/microscopy_images/2pm_vessel/README.md
@@ -69,7 +69,7 @@ pip install opendatalab
 odl get    2-PM_Vessel_Dataset
 cd ..
 python tools/prepare_dataset.py
-python tools/prepare_dataset.py
+python ../../tools/split_seg_dataset.py
 ```
 
 ```none