Skip to content

Commit c1de52a

Browse files
authored
[Project] Medical semantic seg dataset: Covid 19 ct cxr (#2688)
1 parent e4db1f2 commit c1de52a

8 files changed

+352
-0
lines changed
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# Covid-19 CT Chest X-ray Dataset
2+
3+
## Description
4+
5+
This project supports **`Covid-19 CT Chest X-ray Dataset`**, which can be downloaded from [here](https://github.com/ieee8023/covid-chestxray-dataset).
6+
7+
### Dataset Overview
8+
9+
In the context of a COVID-19 pandemic, we want to improve prognostic predictions to triage and manage patient care. Data is the first step to developing any diagnostic/prognostic tool. While there exist large public datasets of more typical chest X-rays from the NIH \[Wang 2017\], Spain \[Bustos 2019\], Stanford \[Irvin 2019\], MIT \[Johnson 2019\] and Indiana University \[Demner-Fushman 2016\], there is no collection of COVID-19 chest X-rays or CT scans designed to be used for computational analysis.
10+
11+
The 2019 novel coronavirus (COVID-19) presents several unique features [Fang, 2020](https://pubs.rsna.org/doi/10.1148/radiol.2020200432) and [Ai 2020](https://pubs.rsna.org/doi/10.1148/radiol.2020200642). While the diagnosis is confirmed using polymerase chain reaction (PCR), infected patients with pneumonia may present on chest X-ray and computed tomography (CT) images with a pattern that is only moderately characteristic for the human eye [Ng, 2020](https://pubs.rsna.org/doi/10.1148/ryct.2020200034). In late January, a Chinese team published a paper detailing the clinical and paraclinical features of COVID-19. They reported that patients present abnormalities in chest CT images with most having bilateral involvement [Huang 2020](<https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext>). Bilateral multiple lobular and subsegmental areas of consolidation constitute the typical findings in chest CT images of intensive care unit (ICU) patients on admission [Huang 2020](<https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext>). In comparison, non-ICU patients show bilateral ground-glass opacity and subsegmental areas of consolidation in their chest CT images [Huang 2020](<https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext>). In these patients, later chest CT images display bilateral ground-glass opacity with resolved consolidation [Huang 2020](<https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext>).
12+
13+
### Statistic Information
14+
15+
| Dataset Name | Anatomical Region | Task Type | Modality | Nnum. Classes | Train/Val/Test Images | Train/Val/Test Labeled | Release date | License |
16+
| ---------------------------------------------------------------------- | ----------------- | ------------ | -------- | ------------- | --------------------- | ---------------------- | ------------ | --------------------------------------------------------------------- |
17+
| [Covid-19-ct-cxr](https://github.com/ieee8023/covid-chestxray-dataset) | thorax | segmentation | x_ray | 2 | 205/-/714 | yes/-/no | 2021 | [CC-BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) |
18+
19+
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
20+
| :--------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
21+
| background | 205 | 72.84 | - | - | - | - |
22+
| lung | 205 | 27.16 | - | - | - | - |
23+
24+
Note:
25+
26+
- `Pct` means percentage of pixels in this category in all pixels.
27+
28+
### Visualization
29+
30+
![cov19ctcxr](https://raw.githubusercontent.com/uni-medical/medical-datasets-visualization/main/2d/semantic_seg/x_ray/covid_19_ct_cxr/covid_19_ct_cxr_dataset.png?raw=true)
31+
32+
### Dataset Citation
33+
34+
```
35+
@article{cohen2020covidProspective,
36+
title={{COVID-19} Image Data Collection: Prospective Predictions Are the Future},
37+
author={Joseph Paul Cohen and Paul Morrison and Lan Dao and Karsten Roth and Tim Q Duong and Marzyeh Ghassemi},
38+
journal={arXiv 2006.11988},
39+
year={2020}
40+
}
41+
42+
@article{cohen2020covid,
43+
title={COVID-19 image data collection},
44+
author={Joseph Paul Cohen and Paul Morrison and Lan Dao},
45+
journal={arXiv 2003.11597},
46+
year={2020}
47+
}
48+
```
49+
50+
### Prerequisites
51+
52+
- Python v3.8
53+
- PyTorch v1.10.0
54+
- pillow(PIL) v9.3.0 9.3.0
55+
- scikit-learn(sklearn) v1.2.0 1.2.0
56+
- [MIM](https://github.com/open-mmlab/mim) v0.3.4
57+
- [MMCV](https://github.com/open-mmlab/mmcv) v2.0.0rc4
58+
- [MMEngine](https://github.com/open-mmlab/mmengine) v0.2.0 or higher
59+
- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) v1.0.0rc5
60+
61+
All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `covid_19_ct_cxr/` root directory, run the following line to add the current directory to `PYTHONPATH`:
62+
63+
```shell
64+
export PYTHONPATH=`pwd`:$PYTHONPATH
65+
```
66+
67+
### Dataset Preparing
68+
69+
- download dataset from [here](https://github.com/ieee8023/covid-chestxray-dataset) and decompress data to path `'data/'`.
70+
- run script `"python tools/prepare_dataset.py"` to format data and change folder structure as below.
71+
- run script `"python ../../tools/split_seg_dataset.py"` to split dataset and generate `train.txt`, `val.txt` and `test.txt`. If the label of official validation set and test set cannot be obtained, we generate `train.txt` and `val.txt` from the training set randomly.
72+
73+
```shell
74+
mkdir data && cd data
75+
git clone [email protected]:ieee8023/covid-chestxray-dataset.git
76+
cd ..
77+
python tools/prepare_dataset.py
78+
python ../../tools/split_seg_dataset.py
79+
```
80+
81+
```none
82+
mmsegmentation
83+
├── mmseg
84+
├── projects
85+
│ ├── medical
86+
│ │ ├── 2d_image
87+
│ │ │ ├── x_ray
88+
│ │ │ │ ├── covid_19_ct_cxr
89+
│ │ │ │ │ ├── configs
90+
│ │ │ │ │ ├── datasets
91+
│ │ │ │ │ ├── tools
92+
│ │ │ │ │ ├── data
93+
│ │ │ │ │ │ ├── train.txt
94+
│ │ │ │ │ │ ├── val.txt
95+
│ │ │ │ │ │ ├── images
96+
│ │ │ │ │ │ │ ├── train
97+
│ │ │ │ | │ │ │ ├── xxx.png
98+
│ │ │ │ | │ │ │ ├── ...
99+
│ │ │ │ | │ │ │ └── xxx.png
100+
│ │ │ │ │ │ ├── masks
101+
│ │ │ │ │ │ │ ├── train
102+
│ │ │ │ | │ │ │ ├── xxx.png
103+
│ │ │ │ | │ │ │ ├── ...
104+
│ │ │ │ | │ │ │ └── xxx.png
105+
```
106+
107+
### Divided Dataset Information
108+
109+
***Note: The table information below is divided by ourselves.***
110+
111+
| Class Name | Num. Train | Pct. Train | Num. Val | Pct. Val | Num. Test | Pct. Test |
112+
| :--------: | :--------: | :--------: | :------: | :------: | :-------: | :-------: |
113+
| background | 164 | 72.88 | 41 | 72.69 | - | - |
114+
| lung | 164 | 27.12 | 41 | 27.31 | - | - |
115+
116+
### Training commands
117+
118+
To train models on a single server with one GPU. (default)
119+
120+
```shell
121+
mim train mmseg ./configs/${CONFIG_FILE}
122+
```
123+
124+
### Testing commands
125+
126+
To test models on a single server with one GPU. (default)
127+
128+
```shell
129+
mim test mmseg ./configs/${CONFIG_FILE} --checkpoint ${CHECKPOINT_PATH}
130+
```
131+
132+
<!-- List the results as usually done in other model's README. [Example](https://github.com/open-mmlab/mmsegmentation/tree/dev-1.x/configs/fcn#results-and-models)
133+
134+
You should claim whether this is based on the pre-trained weights, which are converted from the official release; or it's a reproduced result obtained from retraining the model in this project. -->
135+
136+
## Checklist
137+
138+
- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
139+
140+
- [x] Finish the code
141+
- [x] Basic docstrings & proper citation
142+
- [x] Test-time correctness
143+
- [x] A full README
144+
145+
- [x] Milestone 2: Indicates a successful model implementation.
146+
147+
- [x] Training-time correctness
148+
149+
- [ ] Milestone 3: Good to be a part of our core package!
150+
151+
- [ ] Type hints and docstrings
152+
- [ ] Unit tests
153+
- [ ] Code polishing
154+
- [ ] Metafile.yml
155+
156+
- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
157+
158+
- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
dataset_type = 'Covid19CXRDataset'
2+
data_root = 'data/'
3+
img_scale = (512, 512)
4+
train_pipeline = [
5+
dict(type='LoadImageFromFile'),
6+
dict(type='LoadAnnotations'),
7+
dict(type='Resize', scale=img_scale, keep_ratio=False),
8+
dict(type='RandomFlip', prob=0.5),
9+
dict(type='PhotoMetricDistortion'),
10+
dict(type='PackSegInputs')
11+
]
12+
test_pipeline = [
13+
dict(type='LoadImageFromFile'),
14+
dict(type='Resize', scale=img_scale, keep_ratio=False),
15+
dict(type='LoadAnnotations'),
16+
dict(type='PackSegInputs')
17+
]
18+
train_dataloader = dict(
19+
batch_size=16,
20+
num_workers=4,
21+
persistent_workers=True,
22+
sampler=dict(type='InfiniteSampler', shuffle=True),
23+
dataset=dict(
24+
type=dataset_type,
25+
data_root=data_root,
26+
ann_file='train.txt',
27+
data_prefix=dict(img_path='images/', seg_map_path='masks/'),
28+
pipeline=train_pipeline))
29+
val_dataloader = dict(
30+
batch_size=1,
31+
num_workers=4,
32+
persistent_workers=True,
33+
sampler=dict(type='DefaultSampler', shuffle=False),
34+
dataset=dict(
35+
type=dataset_type,
36+
data_root=data_root,
37+
ann_file='val.txt',
38+
data_prefix=dict(img_path='images/', seg_map_path='masks/'),
39+
pipeline=test_pipeline))
40+
test_dataloader = val_dataloader
41+
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
42+
test_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU', 'mDice'])
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py', './covid-19-ct-cxr_512x512.py',
3+
'mmseg::_base_/default_runtime.py',
4+
'mmseg::_base_/schedules/schedule_20k.py'
5+
]
6+
custom_imports = dict(imports='datasets.covid-19-ct-cxr_dataset')
7+
img_scale = (512, 512)
8+
data_preprocessor = dict(size=img_scale)
9+
optimizer = dict(lr=0.01)
10+
optim_wrapper = dict(optimizer=optimizer)
11+
model = dict(
12+
data_preprocessor=data_preprocessor,
13+
decode_head=dict(
14+
num_classes=2, loss_decode=dict(use_sigmoid=True), out_channels=1),
15+
auxiliary_head=None,
16+
test_cfg=dict(mode='whole', _delete_=True))
17+
vis_backends = None
18+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py', './covid-19-ct-cxr_512x512.py',
3+
'mmseg::_base_/default_runtime.py',
4+
'mmseg::_base_/schedules/schedule_20k.py'
5+
]
6+
custom_imports = dict(imports='datasets.covid-19-ct-cxr_dataset')
7+
img_scale = (512, 512)
8+
data_preprocessor = dict(size=img_scale)
9+
optimizer = dict(lr=0.0001)
10+
optim_wrapper = dict(optimizer=optimizer)
11+
model = dict(
12+
data_preprocessor=data_preprocessor,
13+
decode_head=dict(num_classes=2),
14+
auxiliary_head=None,
15+
test_cfg=dict(mode='whole', _delete_=True))
16+
vis_backends = None
17+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py', './covid-19-ct-cxr_512x512.py',
3+
'mmseg::_base_/default_runtime.py',
4+
'mmseg::_base_/schedules/schedule_20k.py'
5+
]
6+
custom_imports = dict(imports='datasets.covid-19-ct-cxr_dataset')
7+
img_scale = (512, 512)
8+
data_preprocessor = dict(size=img_scale)
9+
optimizer = dict(lr=0.001)
10+
optim_wrapper = dict(optimizer=optimizer)
11+
model = dict(
12+
data_preprocessor=data_preprocessor,
13+
decode_head=dict(num_classes=2),
14+
auxiliary_head=None,
15+
test_cfg=dict(mode='whole', _delete_=True))
16+
vis_backends = None
17+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
_base_ = [
2+
'mmseg::_base_/models/fcn_unet_s5-d16.py', './covid-19-ct-cxr_512x512.py',
3+
'mmseg::_base_/default_runtime.py',
4+
'mmseg::_base_/schedules/schedule_20k.py'
5+
]
6+
custom_imports = dict(imports='datasets.covid-19-ct-cxr_dataset')
7+
img_scale = (512, 512)
8+
data_preprocessor = dict(size=img_scale)
9+
optimizer = dict(lr=0.01)
10+
optim_wrapper = dict(optimizer=optimizer)
11+
model = dict(
12+
data_preprocessor=data_preprocessor,
13+
decode_head=dict(num_classes=2),
14+
auxiliary_head=None,
15+
test_cfg=dict(mode='whole', _delete_=True))
16+
vis_backends = None
17+
visualizer = dict(vis_backends=vis_backends)
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
from mmseg.datasets import BaseSegDataset
2+
from mmseg.registry import DATASETS
3+
4+
5+
@DATASETS.register_module()
6+
class Covid19CXRDataset(BaseSegDataset):
7+
"""Covid19CXRDataset dataset.
8+
9+
In segmentation map annotation for Covid19CXRDataset,
10+
0 stands for background, which is included in 2 categories.
11+
``reduce_zero_label`` is fixed to False. The ``img_suffix``
12+
is fixed to '.png' and ``seg_map_suffix`` is fixed to '.png'.
13+
14+
Args:
15+
img_suffix (str): Suffix of images. Default: '.png'
16+
seg_map_suffix (str): Suffix of segmentation maps. Default: '.png'
17+
reduce_zero_label (bool): Whether to mark label zero as ignored.
18+
Default to False.
19+
"""
20+
METAINFO = dict(classes=('background', 'lung'))
21+
22+
def __init__(self,
23+
img_suffix='.png',
24+
seg_map_suffix='.png',
25+
reduce_zero_label=False,
26+
**kwargs) -> None:
27+
super().__init__(
28+
img_suffix=img_suffix,
29+
seg_map_suffix=seg_map_suffix,
30+
reduce_zero_label=reduce_zero_label,
31+
**kwargs)
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
import os
2+
3+
import numpy as np
4+
from PIL import Image
5+
6+
root_path = 'data/'
7+
src_img_dir = os.path.join(root_path, 'covid-chestxray-dataset', 'images')
8+
src_mask_dir = os.path.join(root_path, 'covid-chestxray-dataset',
9+
'annotations/lungVAE-masks')
10+
tgt_img_train_dir = os.path.join(root_path, 'images/train/')
11+
tgt_mask_train_dir = os.path.join(root_path, 'masks/train/')
12+
tgt_img_test_dir = os.path.join(root_path, 'images/test/')
13+
os.system('mkdir -p ' + tgt_img_train_dir)
14+
os.system('mkdir -p ' + tgt_mask_train_dir)
15+
os.system('mkdir -p ' + tgt_img_test_dir)
16+
17+
18+
def convert_label(img, convert_dict):
19+
arr = np.zeros_like(img, dtype=np.uint8)
20+
for c, i in convert_dict.items():
21+
arr[img == c] = i
22+
return arr
23+
24+
25+
if __name__ == '__main__':
26+
27+
all_img_names = os.listdir(src_img_dir)
28+
all_mask_names = os.listdir(src_mask_dir)
29+
30+
for img_name in all_img_names:
31+
base_name = img_name.replace('.png', '')
32+
base_name = base_name.replace('.jpg', '')
33+
base_name = base_name.replace('.jpeg', '')
34+
mask_name_orig = base_name + '_mask.png'
35+
if mask_name_orig in all_mask_names:
36+
mask_name = base_name + '.png'
37+
src_img_path = os.path.join(src_img_dir, img_name)
38+
src_mask_path = os.path.join(src_mask_dir, mask_name_orig)
39+
tgt_img_path = os.path.join(tgt_img_train_dir, img_name)
40+
tgt_mask_path = os.path.join(tgt_mask_train_dir, mask_name)
41+
42+
img = Image.open(src_img_path).convert('RGB')
43+
img.save(tgt_img_path)
44+
mask = np.array(Image.open(src_mask_path))
45+
mask = convert_label(mask, {0: 0, 255: 1})
46+
mask = Image.fromarray(mask)
47+
mask.save(tgt_mask_path)
48+
else:
49+
src_img_path = os.path.join(src_img_dir, img_name)
50+
tgt_img_path = os.path.join(tgt_img_test_dir, img_name)
51+
img = Image.open(src_img_path).convert('RGB')
52+
img.save(tgt_img_path)

0 commit comments

Comments
 (0)