Skip to content

Commit 041f1f0

Browse files
authored
[Project] Medical semantic seg dataset: breast_cancer_cell_seg (open-mmlab#2726)
1 parent b5fc5ab commit 041f1f0

7 files changed

+319
-0
lines changed
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Breast Cancer Cell Segmentation
2+
3+
## Description
4+
5+
This project support **`Breast Cancer Cell Segmentation`**, and the dataset used in this project can be downloaded from [here](https://tianchi.aliyun.com/dataset/dataDetail?dataId=90152).
6+
7+
### Dataset Overview
8+
9+
In this dataset, there are 58 H&E stained histopathology images used in breast cancer cell detection with associated ground truth data available. Routine histology uses the stain combination of hematoxylin and eosin, commonly referred to as H&E. These images are stained since most cells are essentially transparent, with little or no intrinsic pigment. Certain special stains, which bind selectively to particular components, are be used to identify biological structures such as cells. In those images, the challenging problem is cell segmentation for subsequent classification in benign and malignant cells.
10+
11+
### Original Statistic Information
12+
13+
| Dataset name | Anatomical region | Task type | Modality | Num. Classes | Train/Val/Test Images | Train/Val/Test Labeled | Release Date | License |
14+
| --------------------------------------------------------------------------------------------- | ----------------- | ------------ | -------------- | ------------ | --------------------- | ---------------------- | ------------ | ------------------------------------------------------------------------------------------------------ |
15+
| [Breast Cancer Cell Segmentation](https://tianchi.aliyun.com/dataset/dataDetail?dataId=90152) | thorax | segmentation | histopathology | 2 | 58/-/- | yes/-/- | 2021 | [CC-BY-SA-NC 4.0](http://creativecommons.org/licenses/by-sa/4.0/?spm=5176.12282016.0.0.3f5b5291ypBxb2) |
16+
17+
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
18+
| :----------------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
19+
| normal | 58 | 98.37 | - | - | - | - |
20+
| breast cancer cell | 58 | 1.63 | - | - | - | - |
21+
22+
Note:
23+
24+
- `Pct` means percentage of pixels in this category in all pixels.
25+
26+
### Visualization
27+
28+
![bac](https://raw.githubusercontent.com/uni-medical/medical-datasets-visualization/main/2d/semantic_seg/histopathology/breast_cancer_cell_seg/breast_cancer_cell_seg_dataset.png)
29+
30+
## Dataset Citation
31+
32+
```
33+
@inproceedings{gelasca2008evaluation,
34+
title={Evaluation and benchmark for biological image segmentation},
35+
author={Gelasca, Elisa Drelie and Byun, Jiyun and Obara, Boguslaw and Manjunath, BS},
36+
booktitle={2008 15th IEEE international conference on image processing},
37+
pages={1816--1819},
38+
year={2008},
39+
organization={IEEE}
40+
}
41+
```
42+
43+
### Prerequisites
44+
45+
- Python v3.8
46+
- PyTorch v1.10.0
47+
- [MIM](https://github.com/open-mmlab/mim) v0.3.4
48+
- [MMCV](https://github.com/open-mmlab/mmcv) v2.0.0rc4
49+
- [MMEngine](https://github.com/open-mmlab/mmengine) v0.2.0 or higher
50+
- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) v1.0.0rc5
51+
52+
All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `breast_cancer_cell_seg/` root directory, run the following line to add the current directory to `PYTHONPATH`:
53+
54+
```shell
55+
export PYTHONPATH=`pwd`:$PYTHONPATH
56+
```
57+
58+
### Dataset preparing
59+
60+
- download dataset from [here](https://tianchi.aliyun.com/dataset/dataDetail?dataId=90152) and decompression data to path `'data/'`.
61+
- run script `"python tools/prepare_dataset.py"` to format data and change folder structure as below.
62+
- run script `"python ../../tools/split_seg_dataset.py"` to split dataset and generate `train.txt`, `val.txt` and `test.txt`. If the label of official validation set and test set can't be obtained, we generate `train.txt` and `val.txt` from the training set randomly.
63+
64+
```none
65+
mmsegmentation
66+
├── mmseg
67+
├── projects
68+
│ ├── medical
69+
│ │ ├── 2d_image
70+
│ │ │ ├── histopathology
71+
│ │ │ │ ├── breast_cancer_cell_seg
72+
│ │ │ │ │ ├── configs
73+
│ │ │ │ │ ├── datasets
74+
│ │ │ │ │ ├── tools
75+
│ │ │ │ │ ├── data
76+
│ │ │ │ │ │ ├── train.txt
77+
│ │ │ │ │ │ ├── val.txt
78+
│ │ │ │ │ │ ├── images
79+
│ │ │ │ │ │ │ ├── train
80+
│ │ │ │ | │ │ │ ├── xxx.png
81+
│ │ │ │ | │ │ │ ├── ...
82+
│ │ │ │ | │ │ │ └── xxx.png
83+
│ │ │ │ │ │ ├── masks
84+
│ │ │ │ │ │ │ ├── train
85+
│ │ │ │ | │ │ │ ├── xxx.png
86+
│ │ │ │ | │ │ │ ├── ...
87+
│ │ │ │ | │ │ │ └── xxx.png
88+
```
89+
90+
### Divided Dataset Information
91+
92+
***Note: The table information below is divided by ourselves.***
93+
94+
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
95+
| :----------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
96+
| background | 46 | 98.36 | 12 | 98.41 | - | - |
97+
| erythrocytes | 46 | 1.64 | 12 | 1.59 | - | - |
98+
99+
### Training commands
100+
101+
Train models on a single server with one GPU.
102+
103+
```shell
104+
mim train mmseg ./configs/${CONFIG_FILE}
105+
```
106+
107+
### Testing commands
108+
109+
Test models on a single server with one GPU.
110+
111+
```shell
112+
mim test mmseg ./configs/${CONFIG_FILE} --checkpoint ${CHECKPOINT_PATH}
113+
```
114+
115+
<!-- List the results as usually done in other model's README. [Example](https://github.com/open-mmlab/mmsegmentation/tree/dev-1.x/configs/fcn#results-and-models)
116+
117+
You should claim whether this is based on the pre-trained weights, which are converted from the official release; or it's a reproduced result obtained from retraining the model in this project. -->
118+
119+
## Checklist
120+
121+
- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
122+
123+
- [x] Finish the code
124+
125+
- [x] Basic docstrings & proper citation
126+
127+
- [x] Test-time correctness
128+
129+
- [x] A full README
130+
131+
- [ ] Milestone 2: Indicates a successful model implementation.
132+
133+
- [ ] Training-time correctness
134+
135+
- [ ] Milestone 3: Good to be a part of our core package!
136+
137+
- [ ] Type hints and docstrings
138+
139+
- [ ] Unit tests
140+
141+
- [ ] Code polishing
142+
143+
- [ ] Metafile.yml
144+
145+
- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
146+
147+
- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
dataset_type = 'BreastCancerCellSegDataset'
2+
data_root = 'data/'
3+
img_scale = (512, 512)
4+
train_pipeline = [
5+
dict(type='LoadImageFromFile'),
6+
dict(type='LoadAnnotations'),
7+
dict(type='Resize', scale=img_scale, keep_ratio=False),
8+
dict(type='RandomFlip', prob=0.5),
9+
dict(type='PhotoMetricDistortion'),
10+
dict(type='PackSegInputs')
11+
]
12+
test_pipeline = [
13+
dict(type='LoadImageFromFile'),
14+
dict(type='Resize', scale=img_scale, keep_ratio=False),
15+
dict(type='LoadAnnotations'),
16+
dict(type='PackSegInputs')
17+
]
18+
train_dataloader = dict(
19+
batch_size=16,
20+
num_workers=4,
21+
persistent_workers=True,
22+
sampler=dict(type='InfiniteSampler', shuffle=True),
23+
dataset=dict(
24+
type=dataset_type,
25+
data_root=data_root,
26+
ann_file='train.txt',
27+
data_prefix=dict(img_path='images/', seg_map_path='masks/'),
28+
pipeline=train_pipeline))
29+
val_dataloader = dict(
30+
batch_size=1,
31+
num_workers=4,
32+
persistent_workers=True,
33+
sampler=dict(type='DefaultSampler', shuffle=False),
34+
dataset=dict(
35+
type=dataset_type,
36+
data_root=data_root,
37+
ann_file='val.txt',
38+
data_prefix=dict(img_path='images/', seg_map_path='masks/'),
39+
pipeline=test_pipeline))
40+
test_dataloader = val_dataloader
41+
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
42+
test_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_base_ = [
2+
'./breast-cancer-cell-seg_512x512.py',
3+
'mmseg::_base_/models/fcn_unet_s5-d16.py',
4+
'mmseg::_base_/default_runtime.py',
5+
'mmseg::_base_/schedules/schedule_20k.py'
6+
]
7+
custom_imports = dict(imports='datasets.breast-cancer-cell-seg_dataset')
8+
img_scale = (512, 512)
9+
data_preprocessor = dict(size=img_scale)
10+
optimizer = dict(lr=0.0001)
11+
optim_wrapper = dict(optimizer=optimizer)
12+
model = dict(
13+
data_preprocessor=data_preprocessor,
14+
decode_head=dict(num_classes=2),
15+
auxiliary_head=None,
16+
test_cfg=dict(mode='whole', _delete_=True))
17+
vis_backends = None
18+
visualizer = dict(vis_backends=vis_backends)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_base_ = [
2+
'./breast-cancer-cell-seg_512x512.py',
3+
'mmseg::_base_/models/fcn_unet_s5-d16.py',
4+
'mmseg::_base_/default_runtime.py',
5+
'mmseg::_base_/schedules/schedule_20k.py'
6+
]
7+
custom_imports = dict(imports='datasets.breast-cancer-cell-seg_dataset')
8+
img_scale = (512, 512)
9+
data_preprocessor = dict(size=img_scale)
10+
optimizer = dict(lr=0.001)
11+
optim_wrapper = dict(optimizer=optimizer)
12+
model = dict(
13+
data_preprocessor=data_preprocessor,
14+
decode_head=dict(num_classes=2),
15+
auxiliary_head=None,
16+
test_cfg=dict(mode='whole', _delete_=True))
17+
vis_backends = None
18+
visualizer = dict(vis_backends=vis_backends)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_base_ = [
2+
'./breast-cancer-cell-seg_512x512.py',
3+
'mmseg::_base_/models/fcn_unet_s5-d16.py',
4+
'mmseg::_base_/default_runtime.py',
5+
'mmseg::_base_/schedules/schedule_20k.py'
6+
]
7+
custom_imports = dict(imports='datasets.breast-cancer-cell-seg_dataset')
8+
img_scale = (512, 512)
9+
data_preprocessor = dict(size=img_scale)
10+
optimizer = dict(lr=0.01)
11+
optim_wrapper = dict(optimizer=optimizer)
12+
model = dict(
13+
data_preprocessor=data_preprocessor,
14+
decode_head=dict(num_classes=2),
15+
auxiliary_head=None,
16+
test_cfg=dict(mode='whole', _delete_=True))
17+
vis_backends = None
18+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
from mmseg.datasets import BaseSegDataset
2+
from mmseg.registry import DATASETS
3+
4+
5+
@DATASETS.register_module()
6+
class BreastCancerCellSegDataset(BaseSegDataset):
7+
"""BreastCancerCellSegDataset dataset.
8+
9+
In segmentation map annotation for BreastCancerCellSegDataset,
10+
``reduce_zero_label`` is fixed to False. The ``img_suffix``
11+
is fixed to '.png' and ``seg_map_suffix`` is fixed to '.png'.
12+
13+
Args:
14+
img_suffix (str): Suffix of images. Default: '.png'
15+
seg_map_suffix (str): Suffix of segmentation maps. Default: '.png'
16+
reduce_zero_label (bool): Whether to mark label zero as ignored.
17+
Default to False.
18+
"""
19+
METAINFO = dict(classes=('normal', 'breast cancer cell'))
20+
21+
def __init__(self,
22+
img_suffix='.png',
23+
seg_map_suffix='.png',
24+
**kwargs) -> None:
25+
super().__init__(
26+
img_suffix=img_suffix,
27+
seg_map_suffix=seg_map_suffix,
28+
reduce_zero_label=False,
29+
**kwargs)
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
import glob
2+
import os
3+
4+
import numpy as np
5+
from PIL import Image
6+
7+
root_path = 'data/'
8+
img_suffix = '.tif'
9+
seg_map_suffix = '.TIF'
10+
save_img_suffix = '.png'
11+
save_seg_map_suffix = '.png'
12+
13+
x_train = glob.glob(
14+
os.path.join('data/Breast Cancer Cell Segmentation_datasets/Images/*' +
15+
img_suffix))
16+
17+
os.system('mkdir -p ' + root_path + 'images/train/')
18+
os.system('mkdir -p ' + root_path + 'masks/train/')
19+
20+
D2_255_convert_dict = {0: 0, 255: 1}
21+
22+
23+
def convert_2d(img, convert_dict=D2_255_convert_dict):
24+
arr_2d = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8)
25+
for c, i in convert_dict.items():
26+
arr_2d[img == c] = i
27+
return arr_2d
28+
29+
30+
part_dir_dict = {0: 'train/'}
31+
for ith, part in enumerate([x_train]):
32+
part_dir = part_dir_dict[ith]
33+
for img in part:
34+
basename = os.path.basename(img)
35+
img_save_path = root_path + 'images/' + part_dir + basename.split(
36+
'.')[0] + save_img_suffix
37+
Image.open(img).save(img_save_path)
38+
mask_path = root_path + 'Breast Cancer Cell Segmentation_datasets/Masks/' + '_'.join( # noqa
39+
basename.split('_')[:-1]) + seg_map_suffix
40+
label = np.array(Image.open(mask_path))
41+
42+
save_mask_path = root_path + 'masks/' + part_dir + basename.split(
43+
'.')[0] + save_seg_map_suffix
44+
assert len(label.shape) == 2 and 255 in label and 1 not in label
45+
mask = convert_2d(label)
46+
mask = Image.fromarray(mask.astype(np.uint8))
47+
mask.save(save_mask_path)

0 commit comments

Comments
 (0)