Add instrructional text to the notebook.

davidbau · davidbau · commit fdc0bb9fda83 · 2020-07-08T09:47:06.000-04:00
diff --git a/notebooks/DemoSegmenter.ipynb b/notebooks/DemoSegmenter.ipynb
@@ -1,5 +1,28 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Semantic Segmentation Demo\n",
+    "\n",
+    "This is a notebook for running the benchmark semantic segmentation network from the the [ADE20K MIT Scene Parsing Benchchmark](http://sceneparsing.csail.mit.edu/).\n",
+    "\n",
+    "The code for this notebook is available here\n",
+    "https://github.com/davidbau/semantic-segmentation-pytorch/tree/tutorial/notebooks\n",
+    "\n",
+    "It can be run on Colab at this URL https://colab.research.google.com/github/davidbau/semantic-segmentation-pytorch/blob/tutorial/notebooks/DemoSegmenter.ipynb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Environment Setup\n",
+    "\n",
+    "First, download the code and pretrained models if we are on colab."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -16,6 +39,15 @@
     "DOWNLOAD_ONLY=1 ./demo_test.sh 2>> install.log"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Imports and utility functions\n",
+    "\n",
+    "We need pytorch, numpy, and the code for the segmentation model.  And some othe utilities for visualizing the data."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -24,13 +56,10 @@
    "source": [
     "# System libs\n",
     "import os\n",
-    "import argparse\n",
-    "from distutils.version import LooseVersion\n",
     "# Numerical libs\n",
-    "import numpy as np\n",
-    "import torch\n",
-    "import torch.nn as nn\n",
+    "import torch, numpy\n",
     "from scipy.io import loadmat\n",
+    "from torchvision import transforms\n",
     "import csv\n",
     "# Our libs\n",
     "from mit_semseg.dataset import TestDataset\n",
@@ -39,16 +68,8 @@
     "from mit_semseg.lib.nn import user_scattered_collate, async_copy_to\n",
     "from mit_semseg.lib.utils import as_numpy\n",
     "from PIL import Image\n",
-    "from tqdm import tqdm\n",
-    "from mit_semseg.config import cfg"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
+    "from mit_semseg.config import cfg\n",
+    "\n",
     "colors = loadmat('data/color150.mat')['colors']\n",
     "names = {}\n",
     "with open('data/object150_info.csv') as f:\n",
@@ -57,18 +78,28 @@
     "    for row in reader:\n",
     "        names[int(row[0])] = row[5].split(\";\")[0]\n",
     "\n",
-    "\n",
     "def visualize_result(data, pred):\n",
     "    (img, info) = data\n",
     "\n",
     "    # colorize prediction\n",
-    "    pred_color = colorEncode(pred, colors).astype(np.uint8)\n",
+    "    pred_color = colorEncode(pred, colors).astype(numpy.uint8)\n",
     "\n",
     "    # aggregate images and save\n",
-    "    im_vis = np.concatenate((img, pred_color), axis=1)\n",
+    "    im_vis = numpy.concatenate((img, pred_color), axis=1)\n",
     "    display(Image.fromarray(im_vis))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Loading the segmentation model\n",
+    "\n",
+    "Here we load a pretrained segmentation model.  Like any pytorch model, we can call it like a function, or example the parameters in all the layers.\n",
+    "\n",
+    "After loading, we put it on the GPU.  And since we are doing inference, not training, we put the model in eval mode."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -87,24 +118,79 @@
     "    weights='ckpt/ade20k-resnet50dilated-ppm_deepsup/decoder_epoch_20.pth',\n",
     "    use_softmax=True)\n",
     "\n",
-    "crit = nn.NLLLoss(ignore_index=-1)\n",
-    "\n",
+    "crit = torch.nn.NLLLoss(ignore_index=-1)\n",
     "segmentation_module = SegmentationModule(net_encoder, net_decoder, crit)\n",
     "segmentation_module.eval()\n",
-    "segmentation_module.cuda()\n",
+    "segmentation_module.cuda()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load test data\n",
     "\n",
-    "# Dataset\n",
-    "dataset_test = TestDataset(\n",
-    "    [{'fpath_img': 'ADE_val_00001519.jpg'}], cfg.DATASET)\n",
+    "Now we load and normalize a single test image.  Here we use the commonplace convention of normalizing the image to a scale for which the RGB values of a large photo dataset would have zero mean and unit standard deviation.  (These numbers come from the imagenet dataset.)  With this normalization, the limiiting ranges of RGB values are within about (-2.2 to +2.7)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load and normalize one image as a singleton tensor batch\n",
+    "pil_to_tensor = transforms.Compose([\n",
+    "    transforms.ToTensor(),\n",
+    "    transforms.Normalize(\n",
+    "        mean=[0.485, 0.456, 0.406], # These are RGB mean+std values\n",
+    "        std=[0.229, 0.224, 0.225])  # across a large photo dataset.\n",
+    "])\n",
+    "img_data = pil_to_tensor(\n",
+    "    Image.open('ADE_val_00001519.jpg').convert('RGB'))\n",
+    "singleton_batch = {'img_data': img_data[None].cuda()}\n",
+    "output_size = img_data.shape[1:]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run the Model\n",
+    "\n",
+    "Finally we just pass the test image to the segmentation model.\n",
+    "\n",
+    "The segmentation model is coded as a function that takes a dictionary as input, because it wants to know both the input batch image data as well as the desired output segmentation resolution.  We ask for full resolution output.\n",
     "\n",
-    "singleton_batch = {'img_data': dataset_test[0]['img_data'][4].cuda()}\n",
+    "Then we use the previously-defined visualize_result function to render the semgnatioon map."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run the segmentation at the highest resolution.\n",
     "with torch.no_grad():\n",
-    "    scores = segmentation_module(singleton_batch, segSize=dataset_test[0]['img_ori'].shape[:2])\n",
+    "    scores = segmentation_module(singleton_batch, segSize=output_size)\n",
+    "    \n",
+    "# Get the predicted scores for each pixel\n",
     "_, pred = torch.max(scores, dim=1)\n",
     "visualize_result(\n",
     "      (dataset_test[0]['img_ori'], dataset_test[0]['info']),\n",
-    "      pred.cpu()[0].numpy()\n",
-    ")"
+    "      pred.cpu()[0].numpy())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Run the model at multiple sizes\n",
+    "\n",
+    "One way to get slightly cleaner predictions from a segmentation model is to run the model several times on the same image at different resolutions, and then take the average of the scores for prredictions.\n",
+    "\n",
+    "This code does that."
    ]
   },
   {