fixes for regression tutorial (pyro-ppl#511)

karalets · jpchen · commit 127b7cc2f5e6 · 2017-11-05T21:55:51.000-08:00
* updated regression tutorial to be about regression, fixed the code to say prediction instead of latent, made tutorial consistent with example code

* fixed linting

* added extra sentence mentioning lifting for nn

* fixing JPs comments

* fixing JPs comments part 2
diff --git a/examples/bayesian_regression.py b/examples/bayesian_regression.py
@@ -54,18 +54,17 @@ def model(data):
     bias_sigma = Variable(torch.ones(1)).type_as(data)
     w_prior, b_prior = Normal(mu, sigma), Normal(bias_mu, bias_sigma)
     priors = {'linear.weight': w_prior, 'linear.bias': b_prior}
-    # wrap regression model that lifts module parameters to random variables
-    # sampled from the priors
+    # lift module parameters to random variables sampled from the priors
     lifted_module = pyro.random_module("module", regression_model, priors)
-    # sample a nn
-    lifted_nn = lifted_module()
+    # sample a regressor (which also samples w and b)
+    lifted_reg_model = lifted_module()
 
     with pyro.iarange("map", N, subsample=data):
         x_data = data[:, :-1]
         y_data = data[:, -1]
-        # run the nn with the data
-        latent = lifted_nn(x_data).squeeze()
-        pyro.observe("obs", Normal(latent, Variable(torch.ones(data.size(0))).type_as(data)), y_data.squeeze())
+        # run the regressor forward conditioned on data
+        prediction_mean = lifted_reg_model(x_data).squeeze()
+        pyro.observe("obs", Normal(prediction_mean, Variable(torch.ones(data.size(0))).type_as(data)), y_data.squeeze())
 
 
 def guide(data):
@@ -84,8 +83,8 @@ def guide(data):
     priors = {'linear.weight': w_prior, 'linear.bias': b_prior}
     # overloading the parameters in the module with random samples from the prior
     lifted_module = pyro.random_module("module", regression_model, priors)
-    # sample a nn
-    lifted_module()
+    # sample a regressor
+    return lifted_module()
 
 
 # instantiate optim and inference objects
diff --git a/tutorial/source/bayesian_regression.ipynb b/tutorial/source/bayesian_regression.ipynb
@@ -77,7 +77,7 @@
     "Note that we generate the data with a fixed observation noise $\\sigma = 0.1$.\n",
     "\n",
     "## Regression\n",
-    "Now let's define our regression model in the form of a neural network. We'll use PyTorch's `nn.Module` for this.  Our input $X$ is a matrix of size $N \\times p$ and our output $y$ is a vector of size $p \\times 1$.  The function `nn.Linear(p, 1)` defines a linear transformation of the form $Xw + b$ where $w$ is the weight matrix and $b$ is the additive bias."
+    "Now let's define our regression model. We'll use PyTorch's `nn.Module` for this.  Our input $X$ is a matrix of size $N \\times p$ and our output $y$ is a vector of size $p \\times 1$.  The function `nn.Linear(p, 1)` defines a linear transformation of the form $Xw + b$ where $w$ is the weight matrix and $b$ is the additive bias."
    ]
   },
   {
@@ -167,7 +167,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Not too bad - you can see that the neural net learned parameters that were pretty close to the ground truth of $w = 3, b = 1$.  But how confident should we be in these point estimates?\n",
+    "Not too bad - you can see that the regressor learned parameters that were pretty close to the ground truth of $w = 3, b = 1$.  But how confident should we be in these point estimates?\n",
     "\n",
     "Bayesian modeling (see [here](http://mlg.eng.cam.ac.uk/zoubin/papers/NatureReprint15.pdf) for an overview) offers a systematic framework for reasoning about model uncertainty. Instead of just learning point estimates, we're going to learn a _distribution_ over values of the parameters $w$ and $b$ that are consistent with the observed data."
    ]
@@ -182,7 +182,7 @@
     "\n",
     "### `random_module()`\n",
     "\n",
-    "In order to do this, we'll 'lift' the parameters $w$ and $b$ to random variables. We can do this in Pyro via `random_module()`, which effectively takes a `nn.Module` and turns it into a distribution over neural networks. Specifically, each parameter in the original neural net is sampled from the provided prior. This allows us to repurpose vanilla neural nets for use in the Bayesian setting. For example:"
+    "In order to do this, we'll 'lift' the parameters $w$ and $b$ to random variables. We can do this in Pyro via `random_module()`, which effectively takes a `nn.Module` and turns it into a distribution over regressors. Specifically, each parameter in the original regression model is sampled from the provided prior. This allows us to repurpose vanilla regression models for use in the Bayesian setting. For example:"
    ]
   },
   {
@@ -195,10 +195,10 @@
     "sigma = Variable(torch.ones(1, 1))\n",
     "# define a unit normal prior\n",
     "prior = Normal(mu, sigma)\n",
-    "# overload the parameters in the regression nn with samples from the prior\n",
+    "# overload the parameters in the regression module with samples from the prior\n",
     "lifted_module = pyro.random_module(\"regression_module\", regression_model, prior)\n",
-    "# sample a nn from the prior\n",
-    "sampled_nn = lifted_module()"
+    "# sample a regressor from the prior\n",
+    "sampled_reg_model = lifted_module()"
    ]
   },
   {
@@ -207,7 +207,7 @@
    "source": [
     "### Model\n",
     "\n",
-    "We now have all the ingredients needed to specify our model. First we define priors over $w$ and $b$.  Because we're uncertain about the parameters a priori, we'll use relatively wide priors $\\mathcal{N}(\\mu = 0, \\sigma = 10)$.  Then we wrap `regression_model` with `random_module` and sample an instance of the neural net, `lifted_nn`. We then run the neural net forward on the inputs `x_data`. Finally we use the `pyro.observe` statement to condition on the observed data `y_data`. Note that we use the same fixed observation "
+    "We now have all the ingredients needed to specify our model. First we define priors over $w$ and $b$.  Because we're uncertain about the parameters a priori, we'll use relatively wide priors $\\mathcal{N}(\\mu = 0, \\sigma = 10)$.  Then we wrap `regression_model` with `random_module` and sample an instance of the regressor, `lifted_rm`. We then run the regressor forward on the inputs `x_data`. Finally we use the `pyro.observe` statement to condition on the observed data `y_data`. Note that we use the same fixed observation "
    ]
   },
   {
@@ -224,14 +224,14 @@
     "    bias_mu, bias_sigma = Variable(torch.zeros(1)), Variable(10 * torch.ones(1))\n",
     "    w_prior, b_prior = Normal(mu, sigma), Normal(bias_mu, bias_sigma)\n",
     "    priors = {'linear.weight': w_prior, 'linear.bias': b_prior}\n",
-    "    # lift module parameters to random variables\n",
+    "    # lift module parameters to random variables sampled from the priors\n",
     "    lifted_module = pyro.random_module(\"module\", regression_model, priors)\n",
-    "    # sample a nn (which also samples w and b)\n",
-    "    lifted_nn = lifted_module()\n",
-    "    # run the nn forward\n",
-    "    latent = lifted_nn(x_data).squeeze()\n",
+    "    # sample a regressor (which also samples w and b)\n",
+    "    lifted_reg_model = lifted_module()\n",
+    "    # run the regressor forward conditioned on data\n",
+    "    prediction_mean = lifted_reg_model(x_data).squeeze()\n",
     "    # condition on the observed data\n",
-    "    pyro.observe(\"obs\", Normal(latent, Variable(0.1 * torch.ones(data.size(0)))),\n",
+    "    pyro.observe(\"obs\", Normal(prediction_mean, Variable(0.1 * torch.ones(data.size(0)))),\n",
     "                 y_data.squeeze())"
    ]
   },
@@ -272,15 +272,15 @@
     "    # overload the parameters in the module with random samples \n",
     "    # from the guide distributions\n",
     "    lifted_module = pyro.random_module(\"module\", regression_model, dists)\n",
-    "    # sample a nn (which also samples w and b)\n",
+    "    # sample a regressor (which also samples w and b)\n",
     "    return lifted_module()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Note that we choose Gaussians for both guide distributions. Also, to ensure positivity, we pass each log sigma through a `softplus()` transformation.\n",
+    "Note that we choose Gaussians for both guide distributions. Also, to ensure positivity, we pass each log sigma through a `softplus()` transformation (an alternative to ensure positivity would be an `exp()`-transformation).\n",
     "\n",
     "## Inference\n",
     "\n",
@@ -369,7 +369,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Finally, let's evaluate our model by checking its predictive accuracy on new test data. This is known as _point evaluation_.  We'll sample 20 neural nets from our posterior and run them on the new test data, then average across their predictions and calculate the MSE of the predicted values compared to the ground truth."
+    "Finally, let's evaluate our model by checking its predictive accuracy on new test data. This is known as _point evaluation_.  We'll sample 20 regressors from our posterior and run them on the new test data, then average across their predictions and calculate the MSE of the predicted values compared to the ground truth."
    ]
   },
   {
@@ -404,6 +404,13 @@
     "```"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Bayesian nonlinear regression can also be implemented analogously by lifting neural network modules."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},