@@ -53,8 +53,8 @@ In particular, the ridge model is the same as the OLS model:
5353
5454 \mathbf {y} = \mathbf {bX} + \mathbf {\epsilon }
5555
56- where :math: `\epsilon \sim \mathcal {N}(0 , \sigma ^2 I )`, except now the error
57- for the model is calculated as
56+ where :math: `\epsilon \sim \mathcal {N}(\mathbf { 0 } , \sigma ^2 \mathbf {I} )`,
57+ except now the error for the model is calculated as
5858
5959.. math ::
6060
@@ -66,9 +66,9 @@ the adjusted normal equation:
6666.. math ::
6767
6868 \hat {\mathbf {b}}_{Ridge} =
69- (\mathbf {X}^\top \mathbf {X} + \alpha I )^{-1 } \mathbf {X}^\top \mathbf {y}
69+ (\mathbf {X}^\top \mathbf {X} + \alpha \mathbf {I} )^{-1 } \mathbf {X}^\top \mathbf {y}
7070
71- where :math: `(\mathbf {X}^\top \mathbf {X} + \alpha I )^{-1 }
71+ where :math: `(\mathbf {X}^\top \mathbf {X} + \alpha \mathbf {I} )^{-1 }
7272\mathbf {X}^\top ` is the pseudoinverse / Moore-Penrose inverse adjusted for
7373the `L2 ` penalty on the model coefficients.
7474
@@ -81,7 +81,7 @@ the `L2` penalty on the model coefficients.
8181 <h2 >Bayesian Linear Regression</h2 >
8282
8383In its general form, Bayesian linear regression extends the simple linear
84- regression model by introducing priors on model parameters b and/or the
84+ regression model by introducing priors on model parameters * b * and/or the
8585error variance :math: `\sigma ^2 `.
8686
8787The introduction of a prior allows us to quantify the uncertainty in our
@@ -98,7 +98,7 @@ data :math:`X^*` with the posterior predictive distribution:
9898
9999.. math ::
100100
101- p(y^* \mid X^*, X, Y) = \int _{b} p(y^* \mid X^*, b) p(b \mid X, y) db
101+ p(y^* \mid X^*, X, Y) = \int _{b} p(y^* \mid X^*, b) p(b \mid X, y) \ \text {d}b
102102
103103 Depending on the choice of prior it may be impossible to compute an
104104analytic form for the posterior / posterior predictive distribution. In
@@ -116,11 +116,11 @@ prior on `b` is Gaussian. A common parameterization is:
116116
117117.. math ::
118118
119- b | \sigma , b_V \sim \mathcal {N}(b_{mean} , \sigma ^2 b_V )
119+ b | \sigma , V \sim \mathcal {N}(\mu , \sigma ^2 V )
120120
121- where :math: `b_{mean} `, :math: `\sigma ` and :math: `b_V ` are hyperparameters. Ridge
122- regression is a special case of this model where :math: `b_{mean}` = 0,
123- :math: `\sigma ` = 1 and :math: `b_V = I` (ie. , the prior on ` b ` is a zero-mean,
121+ where :math: `\mu `, :math: `\sigma ` and :math: `V ` are hyperparameters. Ridge
122+ regression is a special case of this model where :math: `\mu = 0 ` ,
123+ :math: `\sigma = 1 ` and :math: `V = I` (i.e. , the prior on * b * is a zero-mean,
124124unit covariance Gaussian).
125125
126126Due to the conjugacy of the above prior with the Gaussian likelihood, there
@@ -129,22 +129,22 @@ parameters:
129129
130130.. math ::
131131
132- A &= (b_V ^{-1 } + X^\top X)^{-1 } \\
133- \mu _b &= A b_V ^{-1 } b_{mean} + A X^\top y \\
134- \text {cov}_b &= \sigma ^2 A \\
132+ A &= (V ^{-1 } + X^\top X)^{-1 } \\
133+ \mu _b &= A V ^{-1 } \mu + A X^\top y \\
134+ \Sigma _b &= \sigma ^2 A \\
135135
136136 The model posterior is then
137137
138138.. math ::
139139
140- b \mid X, y \sim \mathcal {N}(\mu _b, \text {cov}_b )
140+ b \mid X, y \sim \mathcal {N}(\mu _b, \Sigma _b )
141141
142142 We can also compute a closed-form solution for the posterior predictive distribution as
143143well:
144144
145145.. math ::
146146
147- y^* \mid X^*, X, Y \sim \mathcal {N}(X^* \mu _b, \ \ X^* \text {cov}_b X^{* \top } + I)
147+ y^* \mid X^*, X, Y \sim \mathcal {N}(X^* \mu _b, \ \ X^* \Sigma X^{* \top } + I)
148148
149149 where :math: `X^*` is the matrix of new data we wish to predict, and :math: `y^*`
150150are the predicted targets for those data.
@@ -160,7 +160,7 @@ are the predicted targets for those data.
160160
161161--------------------------------
162162
163- If *both * b and the error variance :math: `\sigma ^2 ` are unknown, the
163+ If *both * * b * and the error variance :math: `\sigma ^2 ` are unknown, the
164164conjugate prior for the Gaussian likelihood is the Normal-Gamma
165165distribution (univariate likelihood) or the Normal-Inverse-Wishart
166166distribution (multivariate likelihood).
@@ -169,22 +169,22 @@ distribution (multivariate likelihood).
169169
170170 .. math ::
171171
172- b, \sigma ^2 &\sim \text {NG}(b_{mean}, b_{V} , \alpha , \beta ) \\
172+ b, \sigma ^2 &\sim \text {NG}(\mu , V , \alpha , \beta ) \\
173173 \sigma ^2 &\sim \text {InverseGamma}(\alpha , \beta ) \\
174- b \mid \sigma ^2 &\sim \mathcal {N}(b_{mean} , \sigma ^2 b_{V} )
174+ b \mid \sigma ^2 &\sim \mathcal {N}(\mu , \sigma ^2 V )
175175
176- where :math: `\alpha , \beta , b_{V} `, and :math: `b_{mean} ` are
177- parameters of the prior.
176+ where :math: `\alpha , \beta , V `, and :math: `\mu ` are parameters of the
177+ prior.
178178
179179 **Multivariate **
180180
181181 .. math ::
182182
183- b, \Sigma &\sim \mathcal {NIW}(b_{mean} , \lambda , \Psi , \rho ) \\
183+ b, \Sigma &\sim \mathcal {NIW}(\mu , \lambda , \Psi , \rho ) \\
184184 \Sigma &\sim \mathcal {W}^{-1 }(\Psi , \rho ) \\
185- b \mid \Sigma &\sim \mathcal {N}(b_{mean} , \frac {1 }{\lambda } \Sigma )
185+ b \mid \Sigma &\sim \mathcal {N}(\mu , \frac {1 }{\lambda } \Sigma )
186186
187- where :math: `b_{mean} , \lambda , \Psi `, and :math: `\rho ` are
187+ where :math: `\mu , \lambda , \Psi `, and :math: `\rho ` are
188188 parameters of the prior.
189189
190190
@@ -194,30 +194,30 @@ parameters:
194194
195195.. math ::
196196
197- B &= y - X b_{mean} \\
197+ B &= y - X \mu \\
198198 \text {shape} &= N + \alpha \\
199- \text {scale} &= \frac {1 }{\text {shape}} (\alpha \beta + B^\top (X b_V X^\top + I)^{-1 } B) \\
199+ \text {scale} &= \frac {1 }{\text {shape}} (\alpha \beta + B^\top (X V X^\top + I)^{-1 } B) \\
200200
201201 where
202202
203203.. math ::
204204
205205 \sigma ^2 \mid X, y &\sim \text {InverseGamma}(\text {shape}, \text {scale}) \\
206- A &= (b_V ^{-1 } + X^\top X)^{-1 } \\
207- \mu _b &= A b_V ^{-1 } b_{mean} + A X^\top y \\
208- \text {cov}_b &= \sigma ^2 A
206+ A &= (V ^{-1 } + X^\top X)^{-1 } \\
207+ \mu _b &= A V ^{-1 } \mu + A X^\top y \\
208+ \Sigma _b &= \sigma ^2 A
209209
210210 The model posterior is then
211211
212212.. math ::
213213
214- b | X, y, \sigma ^2 \sim \mathcal {N}(\mu _b, \text {cov}_b )
214+ b | X, y, \sigma ^2 \sim \mathcal {N}(\mu _b, \Sigma _b )
215215
216216 We can also compute a closed-form solution for the posterior predictive distribution:
217217
218218.. math ::
219219
220- y^* \mid X^*, X, Y \sim \mathcal {N}(X^* \mu _b, \ X^* \text {cov}_b X^{* \top } + I)
220+ y^* \mid X^*, X, Y \sim \mathcal {N}(X^* \mu _b, \ X^* \Sigma _b X^{* \top } + I)
221221
222222 **Models **
223223
0 commit comments