[Project] Medical semantic seg dataset: chest_image_pneum (open-mmlab#2727)

tianbinli · web-flow · commit 65c8d77d6248 · 2023-06-20T21:13:32.000+08:00
diff --git a/projects/medical/2d_image/x_ray/chest_image_pneum/README.md b/projects/medical/2d_image/x_ray/chest_image_pneum/README.md
@@ -0,0 +1,147 @@
+# Chest Image Dataset for Pneumothorax Segmentation
+
+## Description
+
+This project supports **`Chest Image Dataset for Pneumothorax Segmentation`**, which can be downloaded from [here](https://tianchi.aliyun.com/dataset/83075).
+
+### Dataset Overview
+
+Pneumothorax can be caused by a blunt chest injury, damage from underlying lung disease, or most horrifying—it may occur for no obvious reason at all. On some occasions, a collapsed lung can be a life-threatening event.
+Pneumothorax is usually diagnosed by a radiologist on a chest x-ray, and can sometimes be very difficult to confirm. An accurate AI algorithm to detect pneumothorax would be useful in a lot of clinical scenarios. AI could be used to triage chest radiographs for priority interpretation, or to provide a more confident diagnosis for non-radiologists.
+
+The dataset is provided by the Society for Imaging Informatics in Medicine(SIIM), American College of Radiology (ACR),Society of Thoracic Radiology (STR) and MD.ai. You can develop a model to classify (and if present, segment) pneumothorax from a set of chest radiographic images. If successful, you could aid in the early recognition of pneumothoraces and save lives.
+
+### Original Statistic Information
+
+| Dataset name                                                          | Anatomical region | Task type    | Modality | Num. Classes | Train/Val/Test Images | Train/Val/Test Labeled | Release Date | License                                                            |
+| --------------------------------------------------------------------- | ----------------- | ------------ | -------- | ------------ | --------------------- | ---------------------- | ------------ | ------------------------------------------------------------------ |
+| [pneumothorax segmentation](https://tianchi.aliyun.com/dataset/83075) | thorax            | segmentation | x_ray    | 2            | 12089/-/3205          | yes/-/no               | -            | [CC-BY-SA-NC 4.0](https://creativecommons.org/licenses/by-sa/4.0/) |
+
+|    Class Name     | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
+| :---------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
+|      normal       |   12089    |   99.75    |    -     |    -     |     -     |     -     |
+| pneumothorax area |    2669    |    0.25    |    -     |    -     |     -     |     -     |
+
+Note:
+
+- `Pct` means percentage of pixels in this category in all pixels.
+
+### Visualization
+
+![bac](https://raw.githubusercontent.com/uni-medical/medical-datasets-visualization/main/2d/semantic_seg/x_ray/chest_image_pneum/chest_image_pneum_dataset.png)
+
+### Prerequisites
+
+- Python v3.8
+- PyTorch v1.10.0
+- [MIM](https://github.com/open-mmlab/mim) v0.3.4
+- [MMCV](https://github.com/open-mmlab/mmcv) v2.0.0rc4
+- [MMEngine](https://github.com/open-mmlab/mmengine) v0.2.0 or higher
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) v1.0.0rc5
+
+All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `chest_image_pneum/` root directory, run the following line to add the current directory to `PYTHONPATH`:
+
+```shell
+export PYTHONPATH=`pwd`:$PYTHONPATH
+```
+
+### Dataset preparing
+
+- download dataset from [here](https://tianchi.aliyun.com/dataset/83075) and decompress data to path `'data/'`.
+- run script `"python tools/prepare_dataset.py"` to format data and change folder structure as below.
+- run script `"python ../../tools/split_seg_dataset.py"` to split dataset and generate `train.txt`, `val.txt` and `test.txt`. If the label of official validation set and test set can't be obtained, we generate `train.txt` and `val.txt` from the training set randomly.
+
+```none
+  mmsegmentation
+  ├── mmseg
+  ├── projects
+  │   ├── medical
+  │   │   ├── 2d_image
+  │   │   │   ├── x_ray
+  │   │   │   │   ├── chest_image_pneum
+  │   │   │   │   │   ├── configs
+  │   │   │   │   │   ├── datasets
+  │   │   │   │   │   ├── tools
+  │   │   │   │   │   ├── data
+  │   │   │   │   │   │   ├── train.txt
+  │   │   │   │   │   │   ├── test.txt
+  │   │   │   │   │   │   ├── images
+  │   │   │   │   │   │   │   ├── train
+  │   │   │   │   |   │   │   │   ├── xxx.png
+  │   │   │   │   |   │   │   │   ├── ...
+  │   │   │   │   |   │   │   │   └── xxx.png
+  │   │   │   │   │   │   ├── masks
+  │   │   │   │   │   │   │   ├── train
+  │   │   │   │   |   │   │   │   ├── xxx.png
+  │   │   │   │   |   │   │   │   ├── ...
+  │   │   │   │   |   │   │   │   └── xxx.png
+```
+
+### Divided Dataset Information
+
+***Note: The table information below is divided by ourselves.***
+
+|    Class Name     | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
+| :---------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
+|      normal       |    9637    |   99.75    |   2410   |  99.74   |     -     |     -     |
+| pneumothorax area |    2137    |    0.25    |   532    |   0.26   |     -     |     -     |
+
+### Training commands
+
+Train models on a single server with one GPU.
+
+```shell
+mim train mmseg ./configs/${CONFIG_FILE}
+```
+
+### Testing commands
+
+Test models on a single server with one GPU.
+
+```shell
+mim test mmseg ./configs/${CONFIG_FILE}  --checkpoint ${CHECKPOINT_PATH}
+```
+
+<!-- List the results as usually done in other model's README. [Example](https://github.com/open-mmlab/mmsegmentation/tree/dev-1.x/configs/fcn#results-and-models)
+
+You should claim whether this is based on the pre-trained weights, which are converted from the official release; or it's a reproduced result obtained from retraining the model in this project. -->
+
+## Results
+
+### Bactteria detection with darkfield microscopy
+
+|     Method      | Backbone | Crop Size |   lr   | mIoU | mDice |                                         config                                         |         download         |
+| :-------------: | :------: | :-------: | :----: | :--: | :---: | :------------------------------------------------------------------------------------: | :----------------------: |
+| fcn_unet_s5-d16 |   unet   |  512x512  |  0.01  |  -   |   -   |  [config](./configs/fcn-unet-s5-d16_unet_1xb16-0.01-20k_chest-image-pneum-512x512.py)  | [model](<>) \| [log](<>) |
+| fcn_unet_s5-d16 |   unet   |  512x512  | 0.001  |  -   |   -   | [config](./configs/fcn-unet-s5-d16_unet_1xb16-0.001-20k_chest-image-pneum-512x512.py)  | [model](<>) \| [log](<>) |
+| fcn_unet_s5-d16 |   unet   |  512x512  | 0.0001 |  -   |   -   | [config](./configs/fcn-unet-s5-d16_unet_1xb16-0.0001-20k_chest-image-pneum-512x512.py) | [model](<>) \| [log](<>) |
+
+## Checklist
+
+- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
+
+  - [x] Finish the code
+
+  - [x] Basic docstrings & proper citation
+
+  - [x] Test-time correctness
+
+  - [x] A full README
+
+- [x] Milestone 2: Indicates a successful model implementation.
+
+  - [x] Training-time correctness
+
+- [ ] Milestone 3: Good to be a part of our core package!
+
+  - [ ] Type hints and docstrings
+
+  - [ ] Unit tests
+
+  - [ ] Code polishing
+
+  - [ ] Metafile.yml
+
+- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
+
+- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
diff --git a/projects/medical/2d_image/x_ray/chest_image_pneum/configs/chest-image-pneum_512x512.py b/projects/medical/2d_image/x_ray/chest_image_pneum/configs/chest-image-pneum_512x512.py
@@ -0,0 +1,42 @@
+dataset_type = 'ChestImagePneumDataset'
+data_root = 'data/'
+img_scale = (512, 512)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations'),
+    dict(type='Resize', scale=img_scale, keep_ratio=False),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='PackSegInputs')
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='Resize', scale=img_scale, keep_ratio=False),
+    dict(type='LoadAnnotations'),
+    dict(type='PackSegInputs')
+]
+train_dataloader = dict(
+    batch_size=16,
+    num_workers=4,
+    persistent_workers=True,
+    sampler=dict(type='InfiniteSampler', shuffle=True),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        ann_file='train.txt',
+        data_prefix=dict(img_path='images/', seg_map_path='masks/'),
+        pipeline=train_pipeline))
+val_dataloader = dict(
+    batch_size=1,
+    num_workers=4,
+    persistent_workers=True,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        ann_file='val.txt',
+        data_prefix=dict(img_path='images/', seg_map_path='masks/'),
+        pipeline=test_pipeline))
+test_dataloader = val_dataloader
+val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
+test_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
diff --git a/projects/medical/2d_image/x_ray/chest_image_pneum/configs/fcn-unet-s5-d16_unet_1xb16-0.0001-20k_chest-image-pneum-512x512.py b/projects/medical/2d_image/x_ray/chest_image_pneum/configs/fcn-unet-s5-d16_unet_1xb16-0.0001-20k_chest-image-pneum-512x512.py
@@ -0,0 +1,18 @@
+_base_ = [
+    './chest-image-pneum_512x512.py',
+    'mmseg::_base_/models/fcn_unet_s5-d16.py',
+    'mmseg::_base_/default_runtime.py',
+    'mmseg::_base_/schedules/schedule_20k.py'
+]
+custom_imports = dict(imports='datasets.chest-image-pneum_dataset')
+img_scale = (512, 512)
+data_preprocessor = dict(size=img_scale)
+optimizer = dict(lr=0.0001)
+optim_wrapper = dict(optimizer=optimizer)
+model = dict(
+    data_preprocessor=data_preprocessor,
+    decode_head=dict(num_classes=2),
+    auxiliary_head=None,
+    test_cfg=dict(mode='whole', _delete_=True))
+vis_backends = None
+visualizer = dict(vis_backends=vis_backends)
diff --git a/projects/medical/2d_image/x_ray/chest_image_pneum/configs/fcn-unet-s5-d16_unet_1xb16-0.001-20k_chest-image-pneum-512x512.py b/projects/medical/2d_image/x_ray/chest_image_pneum/configs/fcn-unet-s5-d16_unet_1xb16-0.001-20k_chest-image-pneum-512x512.py
@@ -0,0 +1,18 @@
+_base_ = [
+    './chest-image-pneum_512x512.py',
+    'mmseg::_base_/models/fcn_unet_s5-d16.py',
+    'mmseg::_base_/default_runtime.py',
+    'mmseg::_base_/schedules/schedule_20k.py'
+]
+custom_imports = dict(imports='datasets.chest-image-pneum_dataset')
+img_scale = (512, 512)
+data_preprocessor = dict(size=img_scale)
+optimizer = dict(lr=0.001)
+optim_wrapper = dict(optimizer=optimizer)
+model = dict(
+    data_preprocessor=data_preprocessor,
+    decode_head=dict(num_classes=2),
+    auxiliary_head=None,
+    test_cfg=dict(mode='whole', _delete_=True))
+vis_backends = None
+visualizer = dict(vis_backends=vis_backends)
diff --git a/projects/medical/2d_image/x_ray/chest_image_pneum/configs/fcn-unet-s5-d16_unet_1xb16-0.01-20k_chest-image-pneum-512x512.py b/projects/medical/2d_image/x_ray/chest_image_pneum/configs/fcn-unet-s5-d16_unet_1xb16-0.01-20k_chest-image-pneum-512x512.py
@@ -0,0 +1,18 @@
+_base_ = [
+    './chest-image-pneum_512x512.py',
+    'mmseg::_base_/models/fcn_unet_s5-d16.py',
+    'mmseg::_base_/default_runtime.py',
+    'mmseg::_base_/schedules/schedule_20k.py'
+]
+custom_imports = dict(imports='datasets.chest-image-pneum_dataset')
+img_scale = (512, 512)
+data_preprocessor = dict(size=img_scale)
+optimizer = dict(lr=0.01)
+optim_wrapper = dict(optimizer=optimizer)
+model = dict(
+    data_preprocessor=data_preprocessor,
+    decode_head=dict(num_classes=2),
+    auxiliary_head=None,
+    test_cfg=dict(mode='whole', _delete_=True))
+vis_backends = None
+visualizer = dict(vis_backends=vis_backends)
diff --git a/projects/medical/2d_image/x_ray/chest_image_pneum/datasets/chest-image-pneum_dataset.py b/projects/medical/2d_image/x_ray/chest_image_pneum/datasets/chest-image-pneum_dataset.py
@@ -0,0 +1,27 @@
+from mmseg.datasets import BaseSegDataset
+from mmseg.registry import DATASETS
+
+
+@DATASETS.register_module()
+class ChestImagePneumDataset(BaseSegDataset):
+    """ChestImagePneumDataset dataset.
+
+    In segmentation map annotation for ChestImagePneumDataset,
+    ``reduce_zero_label`` is fixed to False. The ``img_suffix``
+    is fixed to '.png' and ``seg_map_suffix`` is fixed to '.png'.
+
+    Args:
+        img_suffix (str): Suffix of images. Default: '.png'
+        seg_map_suffix (str): Suffix of segmentation maps. Default: '.png'
+    """
+    METAINFO = dict(classes=('normal', 'pneumothorax area'))
+
+    def __init__(self,
+                 img_suffix='.png',
+                 seg_map_suffix='.png',
+                 **kwargs) -> None:
+        super().__init__(
+            img_suffix=img_suffix,
+            seg_map_suffix=seg_map_suffix,
+            reduce_zero_label=False,
+            **kwargs)
diff --git a/projects/medical/2d_image/x_ray/chest_image_pneum/tools/prepare_dataset.py b/projects/medical/2d_image/x_ray/chest_image_pneum/tools/prepare_dataset.py
@@ -0,0 +1,73 @@
+import os
+
+import numpy as np
+import pandas as pd
+import pydicom
+from PIL import Image
+
+root_path = 'data/'
+img_suffix = '.dcm'
+seg_map_suffix = '.png'
+save_img_suffix = '.png'
+save_seg_map_suffix = '.png'
+
+x_train = []
+for fpath, dirname, fnames in os.walk('data/chestimage_train_datasets'):
+    for fname in fnames:
+        if fname.endswith('.dcm'):
+            x_train.append(os.path.join(fpath, fname))
+x_test = []
+for fpath, dirname, fnames in os.walk('data/chestimage_test_datasets/'):
+    for fname in fnames:
+        if fname.endswith('.dcm'):
+            x_test.append(os.path.join(fpath, fname))
+
+os.system('mkdir -p ' + root_path + 'images/train/')
+os.system('mkdir -p ' + root_path + 'images/test/')
+os.system('mkdir -p ' + root_path + 'masks/train/')
+
+
+def rle_decode(rle, width, height):
+    mask = np.zeros(width * height, dtype=np.uint8)
+    array = np.asarray([int(x) for x in rle.split()])
+    starts = array[0::2]
+    lengths = array[1::2]
+
+    current_position = 0
+    for index, start in enumerate(starts):
+        current_position += start
+        mask[current_position:current_position + lengths[index]] = 1
+        current_position += lengths[index]
+
+    return mask.reshape(width, height, order='F')
+
+
+part_dir_dict = {0: 'train/', 1: 'test/'}
+dict_from_csv = pd.read_csv(
+    root_path + 'chestimage_train-rle_datasets.csv', sep=',',
+    index_col=0).to_dict()[' EncodedPixels']
+
+for ith, part in enumerate([x_train, x_test]):
+    part_dir = part_dir_dict[ith]
+    for img in part:
+        basename = os.path.basename(img)
+        img_id = '.'.join(basename.split('.')[:-1])
+        if ith == 0 and (img_id not in dict_from_csv.keys()):
+            continue
+        image = pydicom.read_file(img).pixel_array
+        save_img_path = root_path + 'images/' + part_dir + '.'.join(
+            basename.split('.')[:-1]) + save_img_suffix
+        print(save_img_path)
+        img_h, img_w = image.shape[:2]
+        image = Image.fromarray(image)
+        image.save(save_img_path)
+        if ith == 1:
+            continue
+        if dict_from_csv[img_id] == '-1':
+            mask = np.zeros((img_h, img_w), dtype=np.uint8)
+        else:
+            mask = rle_decode(dict_from_csv[img_id], img_h, img_w)
+        save_mask_path = root_path + 'masks/' + part_dir + '.'.join(
+            basename.split('.')[:-1]) + save_seg_map_suffix
+        mask = Image.fromarray(mask)
+        mask.save(save_mask_path)