Skip to content

Commit 4feba31

Browse files
authored
[Project] Medical semantic seg dataset: bccs (open-mmlab#2861)
1 parent b8b6ee6 commit 4feba31

7 files changed

+285
-0
lines changed
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# breastCancerCellSegmentation
2+
3+
## Description
4+
5+
This project supports **`breastCancerCellSegmentation`**, which can be downloaded from [here](https://www.heywhale.com/mw/dataset/5e9e9b35ebb37f002c625423).
6+
7+
### Dataset Overview
8+
9+
This dataset, with 58 H&E-stained histopathology images was used for breast cancer cell detection and associated real-world data.
10+
Conventional histology uses a combination of hematoxylin and eosin stains, commonly referred to as H&E. These images are stained because most cells are inherently transparent with little or no intrinsic pigment.
11+
Certain special stains selectively bind to specific components and can be used to identify biological structures such as cells.
12+
13+
### Original Statistic Information
14+
15+
| Dataset name | Anatomical region | Task type | Modality | Num. Classes | Train/Val/Test Images | Train/Val/Test Labeled | Release Date | License |
16+
| -------------------------------------------------------------------------------------------- | ----------------- | ------------ | -------------- | ------------ | --------------------- | ---------------------- | ------------ | --------------------------------------------------------------- |
17+
| [breastCancerCellSegmentation](https://www.heywhale.com/mw/dataset/5e9e9b35ebb37f002c625423) | cell | segmentation | histopathology | 2 | 58/-/- | yes/-/- | 2020 | [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-sa/4.0/) |
18+
19+
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
20+
| :--------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
21+
| background | 58 | 98.37 | - | - | - | - |
22+
| breastCancerCell | 58 | 1.63 | - | - | - | - |
23+
24+
Note:
25+
26+
- `Pct` means percentage of pixels in this category in all pixels.
27+
28+
### Visualization
29+
30+
![bac](https://raw.githubusercontent.com/uni-medical/medical-datasets-visualization/main/2d/semantic_seg/histopathology/breastCancerCellSegmentation/breastCancerCellSegmentation_dataset.png)
31+
32+
## Usage
33+
34+
### Prerequisites
35+
36+
- Python v3.8
37+
- PyTorch v1.10.0
38+
- pillow (PIL) v9.3.0
39+
- scikit-learn (sklearn) v1.2.0
40+
- [MIM](https://github.com/open-mmlab/mim) v0.3.4
41+
- [MMCV](https://github.com/open-mmlab/mmcv) v2.0.0rc4
42+
- [MMEngine](https://github.com/open-mmlab/mmengine) v0.2.0 or higher
43+
- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) v1.0.0
44+
45+
All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `breastCancerCellSegmentation/` root directory, run the following line to add the current directory to `PYTHONPATH`:
46+
47+
```shell
48+
export PYTHONPATH=`pwd`:$PYTHONPATH
49+
```
50+
51+
### Dataset Preparing
52+
53+
- Download dataset from [here](https://www.heywhale.com/mw/dataset/5e9e9b35ebb37f002c625423) and save it to the `data/` directory .
54+
- Decompress data to path `data/`. This will create a new folder named `data/breastCancerCellSegmentation/`, which contains the original image data.
55+
- run script `python tools/prepare_dataset.py` to format data and change folder structure as below.
56+
57+
```none
58+
mmsegmentation
59+
├── mmseg
60+
├── projects
61+
│ ├── medical
62+
│ │ ├── 2d_image
63+
│ │ │ ├── histopathology
64+
│ │ │ │ ├── breastCancerCellSegmentation
65+
│ │ │ │ │ ├── configs
66+
│ │ │ │ │ ├── datasets
67+
│ │ │ │ │ ├── tools
68+
│ │ │ │ │ ├── data
69+
│ │ │ │ │ │ ├── breastCancerCellSegmentation
70+
| │ │ │ │ │ │ ├── train.txt
71+
| │ │ │ │ │ │ ├── val.txt
72+
| │ │ │ │ │ │ ├── images
73+
| │ │ │ │ │ │ | ├── xxx.tif
74+
| │ │ │ │ │ │ ├── masks
75+
| │ │ │ │ │ │ | ├── xxx.TIF
76+
77+
```
78+
79+
### Training commands
80+
81+
Train models on a single server with one GPU.
82+
83+
```shell
84+
mim train mmseg ./configs/${CONFIG_FILE}
85+
```
86+
87+
### Testing commands
88+
89+
Test models on a single server with one GPU.
90+
91+
```shell
92+
mim test mmseg ./configs/${CONFIG_FILE} --checkpoint ${CHECKPOINT_PATH}
93+
```
94+
95+
## Checklist
96+
97+
- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
98+
99+
- [x] Finish the code
100+
101+
- [x] Basic docstrings & proper citation
102+
103+
- [x] Test-time correctness
104+
105+
- [x] A full README
106+
107+
- [ ] Milestone 2: Indicates a successful model implementation.
108+
109+
- [ ] Training-time correctness
110+
111+
- [ ] Milestone 3: Good to be a part of our core package!
112+
113+
- [ ] Type hints and docstrings
114+
115+
- [ ] Unit tests
116+
117+
- [ ] Code polishing
118+
119+
- [ ] Metafile.yml
120+
121+
- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
122+
123+
- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
dataset_type = 'breastCancerCellSegmentationDataset'
2+
data_root = 'data/breastCancerCellSegmentation'
3+
img_scale = (512, 512)
4+
train_pipeline = [
5+
dict(type='LoadImageFromFile', imdecode_backend='tifffile'),
6+
dict(type='LoadAnnotations', imdecode_backend='tifffile'),
7+
dict(type='Resize', scale=img_scale, keep_ratio=False),
8+
dict(type='RandomFlip', prob=0.5),
9+
dict(type='PhotoMetricDistortion'),
10+
dict(type='PackSegInputs')
11+
]
12+
test_pipeline = [
13+
dict(type='LoadImageFromFile', imdecode_backend='tifffile'),
14+
dict(type='Resize', scale=img_scale, keep_ratio=False),
15+
dict(type='LoadAnnotations', imdecode_backend='tifffile'),
16+
dict(type='PackSegInputs')
17+
]
18+
train_dataloader = dict(
19+
batch_size=16,
20+
num_workers=4,
21+
persistent_workers=True,
22+
sampler=dict(type='InfiniteSampler', shuffle=True),
23+
dataset=dict(
24+
type=dataset_type,
25+
data_root=data_root,
26+
ann_file='train.txt',
27+
data_prefix=dict(img_path='images', seg_map_path='masks'),
28+
pipeline=train_pipeline))
29+
val_dataloader = dict(
30+
batch_size=1,
31+
num_workers=4,
32+
persistent_workers=True,
33+
sampler=dict(type='DefaultSampler', shuffle=False),
34+
dataset=dict(
35+
type=dataset_type,
36+
data_root=data_root,
37+
ann_file='val.txt',
38+
data_prefix=dict(img_path='images', seg_map_path='masks'),
39+
pipeline=test_pipeline))
40+
test_dataloader = val_dataloader
41+
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
42+
test_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py',
3+
'./breastCancerCellSegmentation_512x512.py',
4+
'mmseg::_base_/default_runtime.py',
5+
'mmseg::_base_/schedules/schedule_20k.py'
6+
]
7+
custom_imports = dict(imports='datasets.breastCancerCellSegmentation_dataset')
8+
img_scale = (512, 512)
9+
data_preprocessor = dict(size=img_scale)
10+
optimizer = dict(lr=0.0001)
11+
optim_wrapper = dict(optimizer=optimizer)
12+
model = dict(
13+
data_preprocessor=data_preprocessor,
14+
decode_head=dict(num_classes=2),
15+
auxiliary_head=None,
16+
test_cfg=dict(mode='whole', _delete_=True))
17+
vis_backends = None
18+
visualizer = dict(vis_backends=vis_backends)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py',
3+
'./breastCancerCellSegmentation_512x512.py',
4+
'mmseg::_base_/default_runtime.py',
5+
'mmseg::_base_/schedules/schedule_20k.py'
6+
]
7+
custom_imports = dict(imports='datasets.breastCancerCellSegmentation_dataset')
8+
img_scale = (512, 512)
9+
data_preprocessor = dict(size=img_scale)
10+
optimizer = dict(lr=0.001)
11+
optim_wrapper = dict(optimizer=optimizer)
12+
model = dict(
13+
data_preprocessor=data_preprocessor,
14+
decode_head=dict(num_classes=2),
15+
auxiliary_head=None,
16+
test_cfg=dict(mode='whole', _delete_=True))
17+
vis_backends = None
18+
visualizer = dict(vis_backends=vis_backends)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py',
3+
'./breastCancerCellSegmentation_512x512.py',
4+
'mmseg::_base_/default_runtime.py',
5+
'mmseg::_base_/schedules/schedule_20k.py'
6+
]
7+
custom_imports = dict(imports='datasets.breastCancerCellSegmentation_dataset')
8+
img_scale = (512, 512)
9+
data_preprocessor = dict(size=img_scale)
10+
optimizer = dict(lr=0.01)
11+
optim_wrapper = dict(optimizer=optimizer)
12+
model = dict(
13+
data_preprocessor=data_preprocessor,
14+
decode_head=dict(num_classes=2),
15+
auxiliary_head=None,
16+
test_cfg=dict(mode='whole', _delete_=True))
17+
vis_backends = None
18+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
from mmseg.datasets import BaseSegDataset
2+
from mmseg.registry import DATASETS
3+
4+
5+
@DATASETS.register_module()
6+
class breastCancerCellSegmentationDataset(BaseSegDataset):
7+
"""breastCancerCellSegmentationDataset dataset.
8+
9+
In segmentation map annotation for breastCancerCellSegmentationDataset,
10+
``reduce_zero_label`` is fixed to False. The ``img_suffix``
11+
is fixed to '.png' and ``seg_map_suffix`` is fixed to '.png'.
12+
13+
Args:
14+
img_suffix (str): Suffix of images. Default: '.png'
15+
seg_map_suffix (str): Suffix of segmentation maps. Default: '.png'
16+
reduce_zero_label (bool): Whether to mark label zero as ignored.
17+
Default to False.
18+
"""
19+
METAINFO = dict(classes=('background', 'breastCancerCell'))
20+
21+
def __init__(self,
22+
img_suffix='_ccd.tif',
23+
seg_map_suffix='.TIF',
24+
reduce_zero_label=False,
25+
**kwargs) -> None:
26+
super().__init__(
27+
img_suffix=img_suffix,
28+
seg_map_suffix=seg_map_suffix,
29+
reduce_zero_label=reduce_zero_label,
30+
**kwargs)
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
import argparse
2+
import glob
3+
import os
4+
5+
from sklearn.model_selection import train_test_split
6+
7+
8+
def save_anno(img_list, file_path, suffix):
9+
# 只保留文件名,不保留后缀
10+
img_list = [x.split('/')[-1][:-len(suffix)] for x in img_list]
11+
12+
with open(file_path, 'w') as file_:
13+
for x in list(img_list):
14+
file_.write(x + '\n')
15+
16+
17+
if __name__ == '__main__':
18+
parser = argparse.ArgumentParser()
19+
parser.add_argument(
20+
'--data_root', default='data/breastCancerCellSegmentation/')
21+
args = parser.parse_args()
22+
data_root = args.data_root
23+
24+
# 1. 划分训练集、验证集
25+
# 1.1 获取所有图片路径
26+
img_list = glob.glob(os.path.join(data_root, 'images', '*.tif'))
27+
img_list.sort()
28+
mask_list = glob.glob(os.path.join(data_root, 'masks', '*.TIF'))
29+
mask_list.sort()
30+
assert len(img_list) == len(mask_list)
31+
# 1.2 划分训练集、验证集、测试集
32+
train_img_list, val_img_list, train_mask_list, val_mask_list = train_test_split( # noqa
33+
img_list, mask_list, test_size=0.2, random_state=42)
34+
# 1.3 保存划分结果
35+
save_anno(train_img_list, os.path.join(data_root, 'train.txt'), '_ccd.tif')
36+
save_anno(val_img_list, os.path.join(data_root, 'val.txt'), '_ccd.tif')

0 commit comments

Comments
 (0)