Skip to content

Commit 30e3b49

Browse files
authored
[Project] Medical semantic seg dataset: Pcam (#2684)
1 parent 942b054 commit 30e3b49

File tree

9 files changed

+345
-1
lines changed

9 files changed

+345
-1
lines changed
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# PCam (PatchCamelyon)
2+
3+
## Description
4+
5+
This project supports **`Patch Camelyon (PCam) `**, which can be downloaded from [here](https://opendatalab.com/PCam).
6+
7+
### Dataset Overview
8+
9+
PatchCamelyon is an image classification dataset. It consists of 327680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating presence of metastatic tissue. PCam provides a new benchmark for machine learning models: bigger than CIFAR10, smaller than ImageNet, trainable on a single GPU.
10+
11+
### Statistic Information
12+
13+
| Dataset Name | Anatomical Region | Task Type | Modality | Num. Classes | Train/Val/Test images | Train/Val/Test Labeled | Release Date | License |
14+
| ------------------------------------ | ----------------- | ------------ | -------------- | ------------ | --------------------- | ---------------------- | ------------ | ------------------------------------------------------------- |
15+
| [Pcam](https://opendatalab.com/PCam) | throax | segmentation | histopathology | 2 | 327680/-/- | yes/-/- | 2018 | [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) |
16+
17+
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
18+
| :---------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
19+
| background | 214849 | 63.77 | - | - | - | - |
20+
| metastatic tissue | 131832 | 36.22 | - | - | - | - |
21+
22+
Note:
23+
24+
- `Pct` means percentage of pixels in this category in all pixels.
25+
26+
### Visualization
27+
28+
![pcam](https://raw.githubusercontent.com/uni-medical/medical-datasets-visualization/main/2d/semantic_seg/histopathology/pcam/pcam_dataset.png?raw=true)
29+
30+
### Dataset Citation
31+
32+
```
33+
@inproceedings{veeling2018rotation,
34+
title={Rotation equivariant CNNs for digital pathology},
35+
author={Veeling, Bastiaan S and Linmans, Jasper and Winkens, Jim and Cohen, Taco and Welling, Max},
36+
booktitle={International Conference on Medical image computing and computer-assisted intervention},
37+
pages={210--218},
38+
year={2018},
39+
}
40+
```
41+
42+
### Prerequisites
43+
44+
- Python v3.8
45+
- PyTorch v1.10.0
46+
- pillow(PIL) v9.3.0 9.3.0
47+
- scikit-learn(sklearn) v1.2.0 1.2.0
48+
- [MIM](https://github.com/open-mmlab/mim) v0.3.4
49+
- [MMCV](https://github.com/open-mmlab/mmcv) v2.0.0rc4
50+
- [MMEngine](https://github.com/open-mmlab/mmengine) v0.2.0 or higher
51+
- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) v1.0.0rc5
52+
53+
All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `pcam/` root directory, run the following line to add the current directory to `PYTHONPATH`:
54+
55+
```shell
56+
export PYTHONPATH=`pwd`:$PYTHONPATH
57+
```
58+
59+
### Dataset Preparing
60+
61+
- download dataset from [here](https://opendatalab.com/PCam) and decompress data to path `'data/'`.
62+
- run script `"python tools/prepare_dataset.py"` to format data and change folder structure as below.
63+
- run script `"python ../../tools/split_seg_dataset.py"` to split dataset and generate `train.txt`, `val.txt` and `test.txt`. If the label of official validation set and test set cannot be obtained, we generate `train.txt` and `val.txt` from the training set randomly.
64+
65+
```shell
66+
mkdir data & cd data
67+
pip install opendatalab
68+
odl get PCam
69+
mv ./PCam/raw/pcamv1 ./
70+
rm -rf PCam
71+
cd ..
72+
python tools/prepare_dataset.py
73+
python ../../tools/split_seg_dataset.py
74+
```
75+
76+
```none
77+
mmsegmentation
78+
├── mmseg
79+
├── projects
80+
│ ├── medical
81+
│ │ ├── 2d_image
82+
│ │ │ ├── histopathology
83+
│ │ │ │ ├── pcam
84+
│ │ │ │ │ ├── configs
85+
│ │ │ │ │ ├── datasets
86+
│ │ │ │ │ ├── tools
87+
│ │ │ │ │ ├── data
88+
│ │ │ │ │ │ ├── train.txt
89+
│ │ │ │ │ │ ├── val.txt
90+
│ │ │ │ │ │ ├── images
91+
│ │ │ │ │ │ │ ├── train
92+
│ │ │ │ | │ │ │ ├── xxx.png
93+
│ │ │ │ | │ │ │ ├── ...
94+
│ │ │ │ | │ │ │ └── xxx.png
95+
│ │ │ │ │ │ ├── masks
96+
│ │ │ │ │ │ │ ├── train
97+
│ │ │ │ | │ │ │ ├── xxx.png
98+
│ │ │ │ | │ │ │ ├── ...
99+
│ │ │ │ | │ │ │ └── xxx.png
100+
```
101+
102+
### Divided Dataset Information
103+
104+
***Note: The table information below is divided by ourselves.***
105+
106+
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
107+
| :---------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
108+
| background | 171948 | 63.82 | 42901 | 63.6 | - | - |
109+
| metastatic tissue | 105371 | 36.18 | 26461 | 36.4 | - | - |
110+
111+
### Training commands
112+
113+
To train models on a single server with one GPU. (default)
114+
115+
```shell
116+
mim train mmseg ./configs/${CONFIG_FILE}
117+
```
118+
119+
### Testing commands
120+
121+
To test models on a single server with one GPU. (default)
122+
123+
```shell
124+
mim test mmseg ./configs/${CONFIG_FILE} --checkpoint ${CHECKPOINT_PATH}
125+
```
126+
127+
<!-- List the results as usually done in other model's README. [Example](https://github.com/open-mmlab/mmsegmentation/tree/dev-1.x/configs/fcn#results-and-models)
128+
129+
You should claim whether this is based on the pre-trained weights, which are converted from the official release; or it's a reproduced result obtained from retraining the model in this project. -->
130+
131+
## Checklist
132+
133+
- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
134+
135+
- [x] Finish the code
136+
- [x] Basic docstrings & proper citation
137+
- [ ] Test-time correctness
138+
- [x] A full README
139+
140+
- [ ] Milestone 2: Indicates a successful model implementation.
141+
142+
- [ ] Training-time correctness
143+
144+
- [ ] Milestone 3: Good to be a part of our core package!
145+
146+
- [ ] Type hints and docstrings
147+
- [ ] Unit tests
148+
- [ ] Code polishing
149+
- [ ] Metafile.yml
150+
151+
- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
152+
153+
- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py',
3+
'mmseg::_base_/default_runtime.py',
4+
'mmseg::_base_/schedules/schedule_20k.py'
5+
]
6+
custom_imports = dict(imports='datasets.pcam_dataset')
7+
img_scale = (512, 512)
8+
data_preprocessor = dict(size=img_scale)
9+
optimizer = dict(lr=0.0001)
10+
optim_wrapper = dict(optimizer=optimizer)
11+
model = dict(
12+
data_preprocessor=data_preprocessor,
13+
decode_head=dict(num_classes=2),
14+
auxiliary_head=None,
15+
test_cfg=dict(mode='whole', _delete_=True))
16+
vis_backends = None
17+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py',
3+
'mmseg::_base_/default_runtime.py',
4+
'mmseg::_base_/schedules/schedule_20k.py'
5+
]
6+
custom_imports = dict(imports='datasets.pcam_dataset')
7+
img_scale = (512, 512)
8+
data_preprocessor = dict(size=img_scale)
9+
optimizer = dict(lr=0.001)
10+
optim_wrapper = dict(optimizer=optimizer)
11+
model = dict(
12+
data_preprocessor=data_preprocessor,
13+
decode_head=dict(num_classes=2),
14+
auxiliary_head=None,
15+
test_cfg=dict(mode='whole', _delete_=True))
16+
vis_backends = None
17+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py',
3+
'mmseg::_base_/default_runtime.py',
4+
'mmseg::_base_/schedules/schedule_20k.py'
5+
]
6+
custom_imports = dict(imports='datasets.pcam_dataset')
7+
img_scale = (512, 512)
8+
data_preprocessor = dict(size=img_scale)
9+
optimizer = dict(lr=0.01)
10+
optim_wrapper = dict(optimizer=optimizer)
11+
model = dict(
12+
data_preprocessor=data_preprocessor,
13+
decode_head=dict(num_classes=2),
14+
auxiliary_head=None,
15+
test_cfg=dict(mode='whole', _delete_=True))
16+
vis_backends = None
17+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py', './pcam_512x512.py',
3+
'mmseg::_base_/default_runtime.py',
4+
'mmseg::_base_/schedules/schedule_20k.py'
5+
]
6+
custom_imports = dict(imports='datasets.pcam_dataset')
7+
img_scale = (512, 512)
8+
data_preprocessor = dict(size=img_scale)
9+
optimizer = dict(lr=0.01)
10+
optim_wrapper = dict(optimizer=optimizer)
11+
model = dict(
12+
data_preprocessor=data_preprocessor,
13+
decode_head=dict(
14+
num_classes=2, loss_decode=dict(use_sigmoid=True), out_channels=1),
15+
auxiliary_head=None,
16+
test_cfg=dict(mode='whole', _delete_=True))
17+
vis_backends = None
18+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
dataset_type = 'PCamDataset'
2+
data_root = 'data/'
3+
img_scale = (512, 512)
4+
train_pipeline = [
5+
dict(type='LoadImageFromFile'),
6+
dict(type='LoadAnnotations'),
7+
dict(type='Resize', scale=img_scale, keep_ratio=False),
8+
dict(type='RandomFlip', prob=0.5),
9+
dict(type='PhotoMetricDistortion'),
10+
dict(type='PackSegInputs')
11+
]
12+
test_pipeline = [
13+
dict(type='LoadImageFromFile'),
14+
dict(type='Resize', scale=img_scale, keep_ratio=False),
15+
dict(type='LoadAnnotations'),
16+
dict(type='PackSegInputs')
17+
]
18+
train_dataloader = dict(
19+
batch_size=16,
20+
num_workers=4,
21+
persistent_workers=True,
22+
sampler=dict(type='InfiniteSampler', shuffle=True),
23+
dataset=dict(
24+
type=dataset_type,
25+
data_root=data_root,
26+
ann_file='train.txt',
27+
data_prefix=dict(img_path='images/', seg_map_path='masks/'),
28+
pipeline=train_pipeline))
29+
val_dataloader = dict(
30+
batch_size=1,
31+
num_workers=4,
32+
persistent_workers=True,
33+
sampler=dict(type='DefaultSampler', shuffle=False),
34+
dataset=dict(
35+
type=dataset_type,
36+
data_root=data_root,
37+
ann_file='val.txt',
38+
data_prefix=dict(img_path='images/', seg_map_path='masks/'),
39+
pipeline=test_pipeline))
40+
test_dataloader = val_dataloader
41+
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
42+
test_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
from mmseg.datasets import BaseSegDataset
2+
from mmseg.registry import DATASETS
3+
4+
5+
@DATASETS.register_module()
6+
class PCamDataset(BaseSegDataset):
7+
"""PCamDataset dataset.
8+
9+
In segmentation map annotation for PCamDataset,
10+
0 stands for background, which is included in 2 categories.
11+
``reduce_zero_label`` is fixed to False. The ``img_suffix``
12+
is fixed to '.png' and ``seg_map_suffix`` is fixed to '.png'.
13+
14+
Args:
15+
img_suffix (str): Suffix of images. Default: '.png'
16+
seg_map_suffix (str): Suffix of segmentation maps. Default: '.png'
17+
reduce_zero_label (bool): Whether to mark label zero as ignored.
18+
Default to False.
19+
"""
20+
METAINFO = dict(classes=('background', 'metastatic tissue'))
21+
22+
def __init__(self,
23+
img_suffix='.png',
24+
seg_map_suffix='.png',
25+
reduce_zero_label=False,
26+
**kwargs) -> None:
27+
super().__init__(
28+
img_suffix=img_suffix,
29+
seg_map_suffix=seg_map_suffix,
30+
reduce_zero_label=reduce_zero_label,
31+
**kwargs)
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
import os
2+
3+
import h5py
4+
import numpy as np
5+
from PIL import Image
6+
7+
root_path = 'data/'
8+
9+
tgt_img_train_dir = os.path.join(root_path, 'images/train/')
10+
tgt_mask_train_dir = os.path.join(root_path, 'masks/train/')
11+
tgt_img_val_dir = os.path.join(root_path, 'images/val/')
12+
tgt_img_test_dir = os.path.join(root_path, 'images/test/')
13+
14+
os.system('mkdir -p ' + tgt_img_train_dir)
15+
os.system('mkdir -p ' + tgt_mask_train_dir)
16+
os.system('mkdir -p ' + tgt_img_val_dir)
17+
os.system('mkdir -p ' + tgt_img_test_dir)
18+
19+
20+
def extract_pics_from_h5(h5_path, h5_key, save_dir):
21+
f = h5py.File(h5_path, 'r')
22+
for i, img in enumerate(f[h5_key]):
23+
img = img.astype(np.uint8).squeeze()
24+
img = Image.fromarray(img)
25+
save_image_path = os.path.join(save_dir, str(i).zfill(8) + '.png')
26+
img.save(save_image_path)
27+
28+
29+
if __name__ == '__main__':
30+
31+
extract_pics_from_h5(
32+
'data/pcamv1/camelyonpatch_level_2_split_train_x.h5',
33+
h5_key='x',
34+
save_dir=tgt_img_train_dir)
35+
36+
extract_pics_from_h5(
37+
'data/pcamv1/camelyonpatch_level_2_split_valid_x.h5',
38+
h5_key='x',
39+
save_dir=tgt_img_val_dir)
40+
41+
extract_pics_from_h5(
42+
'data/pcamv1/camelyonpatch_level_2_split_test_x.h5',
43+
h5_key='x',
44+
save_dir=tgt_img_test_dir)
45+
46+
extract_pics_from_h5(
47+
'data/pcamv1/camelyonpatch_level_2_split_train_mask.h5',
48+
h5_key='mask',
49+
save_dir=tgt_mask_train_dir)

projects/medical/2d_image/microscopy_images/2pm_vessel/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ pip install opendatalab
6969
odl get 2-PM_Vessel_Dataset
7070
cd ..
7171
python tools/prepare_dataset.py
72-
python tools/prepare_dataset.py
72+
python ../../tools/split_seg_dataset.py
7373
```
7474

7575
```none

0 commit comments

Comments
 (0)