Skip to content

Commit fdc0bb9

Browse files
committed
Add instrructional text to the notebook.
1 parent 8a1e55e commit fdc0bb9

File tree

1 file changed

+114
-28
lines changed

1 file changed

+114
-28
lines changed

notebooks/DemoSegmenter.ipynb

Lines changed: 114 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,28 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Semantic Segmentation Demo\n",
8+
"\n",
9+
"This is a notebook for running the benchmark semantic segmentation network from the the [ADE20K MIT Scene Parsing Benchchmark](http://sceneparsing.csail.mit.edu/).\n",
10+
"\n",
11+
"The code for this notebook is available here\n",
12+
"https://github.com/davidbau/semantic-segmentation-pytorch/tree/tutorial/notebooks\n",
13+
"\n",
14+
"It can be run on Colab at this URL https://colab.research.google.com/github/davidbau/semantic-segmentation-pytorch/blob/tutorial/notebooks/DemoSegmenter.ipynb"
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"### Environment Setup\n",
22+
"\n",
23+
"First, download the code and pretrained models if we are on colab."
24+
]
25+
},
326
{
427
"cell_type": "code",
528
"execution_count": null,
@@ -16,6 +39,15 @@
1639
"DOWNLOAD_ONLY=1 ./demo_test.sh 2>> install.log"
1740
]
1841
},
42+
{
43+
"cell_type": "markdown",
44+
"metadata": {},
45+
"source": [
46+
"## Imports and utility functions\n",
47+
"\n",
48+
"We need pytorch, numpy, and the code for the segmentation model. And some othe utilities for visualizing the data."
49+
]
50+
},
1951
{
2052
"cell_type": "code",
2153
"execution_count": null,
@@ -24,13 +56,10 @@
2456
"source": [
2557
"# System libs\n",
2658
"import os\n",
27-
"import argparse\n",
28-
"from distutils.version import LooseVersion\n",
2959
"# Numerical libs\n",
30-
"import numpy as np\n",
31-
"import torch\n",
32-
"import torch.nn as nn\n",
60+
"import torch, numpy\n",
3361
"from scipy.io import loadmat\n",
62+
"from torchvision import transforms\n",
3463
"import csv\n",
3564
"# Our libs\n",
3665
"from mit_semseg.dataset import TestDataset\n",
@@ -39,16 +68,8 @@
3968
"from mit_semseg.lib.nn import user_scattered_collate, async_copy_to\n",
4069
"from mit_semseg.lib.utils import as_numpy\n",
4170
"from PIL import Image\n",
42-
"from tqdm import tqdm\n",
43-
"from mit_semseg.config import cfg"
44-
]
45-
},
46-
{
47-
"cell_type": "code",
48-
"execution_count": null,
49-
"metadata": {},
50-
"outputs": [],
51-
"source": [
71+
"from mit_semseg.config import cfg\n",
72+
"\n",
5273
"colors = loadmat('data/color150.mat')['colors']\n",
5374
"names = {}\n",
5475
"with open('data/object150_info.csv') as f:\n",
@@ -57,18 +78,28 @@
5778
" for row in reader:\n",
5879
" names[int(row[0])] = row[5].split(\";\")[0]\n",
5980
"\n",
60-
"\n",
6181
"def visualize_result(data, pred):\n",
6282
" (img, info) = data\n",
6383
"\n",
6484
" # colorize prediction\n",
65-
" pred_color = colorEncode(pred, colors).astype(np.uint8)\n",
85+
" pred_color = colorEncode(pred, colors).astype(numpy.uint8)\n",
6686
"\n",
6787
" # aggregate images and save\n",
68-
" im_vis = np.concatenate((img, pred_color), axis=1)\n",
88+
" im_vis = numpy.concatenate((img, pred_color), axis=1)\n",
6989
" display(Image.fromarray(im_vis))"
7090
]
7191
},
92+
{
93+
"cell_type": "markdown",
94+
"metadata": {},
95+
"source": [
96+
"## Loading the segmentation model\n",
97+
"\n",
98+
"Here we load a pretrained segmentation model. Like any pytorch model, we can call it like a function, or example the parameters in all the layers.\n",
99+
"\n",
100+
"After loading, we put it on the GPU. And since we are doing inference, not training, we put the model in eval mode."
101+
]
102+
},
72103
{
73104
"cell_type": "code",
74105
"execution_count": null,
@@ -87,24 +118,79 @@
87118
" weights='ckpt/ade20k-resnet50dilated-ppm_deepsup/decoder_epoch_20.pth',\n",
88119
" use_softmax=True)\n",
89120
"\n",
90-
"crit = nn.NLLLoss(ignore_index=-1)\n",
91-
"\n",
121+
"crit = torch.nn.NLLLoss(ignore_index=-1)\n",
92122
"segmentation_module = SegmentationModule(net_encoder, net_decoder, crit)\n",
93123
"segmentation_module.eval()\n",
94-
"segmentation_module.cuda()\n",
124+
"segmentation_module.cuda()"
125+
]
126+
},
127+
{
128+
"cell_type": "markdown",
129+
"metadata": {},
130+
"source": [
131+
"## Load test data\n",
95132
"\n",
96-
"# Dataset\n",
97-
"dataset_test = TestDataset(\n",
98-
" [{'fpath_img': 'ADE_val_00001519.jpg'}], cfg.DATASET)\n",
133+
"Now we load and normalize a single test image. Here we use the commonplace convention of normalizing the image to a scale for which the RGB values of a large photo dataset would have zero mean and unit standard deviation. (These numbers come from the imagenet dataset.) With this normalization, the limiiting ranges of RGB values are within about (-2.2 to +2.7)."
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"metadata": {},
140+
"outputs": [],
141+
"source": [
142+
"# Load and normalize one image as a singleton tensor batch\n",
143+
"pil_to_tensor = transforms.Compose([\n",
144+
" transforms.ToTensor(),\n",
145+
" transforms.Normalize(\n",
146+
" mean=[0.485, 0.456, 0.406], # These are RGB mean+std values\n",
147+
" std=[0.229, 0.224, 0.225]) # across a large photo dataset.\n",
148+
"])\n",
149+
"img_data = pil_to_tensor(\n",
150+
" Image.open('ADE_val_00001519.jpg').convert('RGB'))\n",
151+
"singleton_batch = {'img_data': img_data[None].cuda()}\n",
152+
"output_size = img_data.shape[1:]"
153+
]
154+
},
155+
{
156+
"cell_type": "markdown",
157+
"metadata": {},
158+
"source": [
159+
"## Run the Model\n",
160+
"\n",
161+
"Finally we just pass the test image to the segmentation model.\n",
162+
"\n",
163+
"The segmentation model is coded as a function that takes a dictionary as input, because it wants to know both the input batch image data as well as the desired output segmentation resolution. We ask for full resolution output.\n",
99164
"\n",
100-
"singleton_batch = {'img_data': dataset_test[0]['img_data'][4].cuda()}\n",
165+
"Then we use the previously-defined visualize_result function to render the semgnatioon map."
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": null,
171+
"metadata": {},
172+
"outputs": [],
173+
"source": [
174+
"# Run the segmentation at the highest resolution.\n",
101175
"with torch.no_grad():\n",
102-
" scores = segmentation_module(singleton_batch, segSize=dataset_test[0]['img_ori'].shape[:2])\n",
176+
" scores = segmentation_module(singleton_batch, segSize=output_size)\n",
177+
" \n",
178+
"# Get the predicted scores for each pixel\n",
103179
"_, pred = torch.max(scores, dim=1)\n",
104180
"visualize_result(\n",
105181
" (dataset_test[0]['img_ori'], dataset_test[0]['info']),\n",
106-
" pred.cpu()[0].numpy()\n",
107-
")"
182+
" pred.cpu()[0].numpy())"
183+
]
184+
},
185+
{
186+
"cell_type": "markdown",
187+
"metadata": {},
188+
"source": [
189+
"### Run the model at multiple sizes\n",
190+
"\n",
191+
"One way to get slightly cleaner predictions from a segmentation model is to run the model several times on the same image at different resolutions, and then take the average of the scores for prredictions.\n",
192+
"\n",
193+
"This code does that."
108194
]
109195
},
110196
{

0 commit comments

Comments
 (0)