Incorrect computation in Posterior Predictive Checking section of BYM Notebook #675

jessegrabowski · 2024-07-26T07:22:15Z

Notebook title: The Besag-York-Mollie Model for Spatial Data
Notebook url:https://www.pymc.io/projects/examples/en/latest/spatial/nyc_bym.html

Issue description

I believe this computation is invalid:

phi_pred = idata.posterior.phi.mean(("chain", "draw")).values
beta0_pred = idata.posterior.beta0.mean(("chain", "draw")).values
sigma_pred = idata.posterior.sigma.mean(("chain", "draw")).values
y_predict = np.exp(log_E + beta0_pred + sigma_pred * (1 / scaling_factor) * phi_pred)

Expected output

This code computes f(E[x]), when the mean prediction conditioned on rho = 1. The correct quantity to compute, however, is E[f(x)]. That is, the expectation should be taken last, only after computing the exp of the samples. The result is biased downward (since Jensen's inequality says E[f(x)] < f(E[x]) )

Proposed solution

This is actually a really great place to use pm.do. I propose:

with pm.do(BYM_model, {'rho':1.0}):
    y_predict_rho_1 = pm.sample_posterior_predictive(idata, var_names=['rho', 'mixture', 'mu'], predictions=True, extend_inferencedata=False)
y_predict = y_predict_rho_1.predictions.mu.mean(dim=['chain', 'draw'])

This would require wrapping mixture and mu in pm.Deterministic in the model code as well.

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2024-07-26T14:04:16Z

Sounds good

bwengals · 2024-08-01T00:56:40Z

Tagging @daniel-saunders-phil

daniel-saunders-phil · 2024-08-01T03:01:32Z

Ah that makes sense! Thanks for noticing that.

daniel-saunders-phil · 2024-08-01T16:05:10Z

@jessegrabowski just to help me understand the behavior of pm.sample_posterior_predictive(), does it make a difference if we pass rho into the var_names? It seems to yield equivalent outcomes if we just pass 'mu','mixture'.

OriolAbril · 2024-08-01T16:52:49Z

I am quite sure adding "rho" or not to var_names will have no effect as long as it is within the pm.do as there it should no longer be a volatile variable (I think that was the name) which are the variables that get resampled. Does it get printed in the logged output?

jessegrabowski · 2024-08-01T16:53:20Z

No, it shouldn't, because after you do pm.do, rho isn't random anymore. I think I put it in because I was paranoid and overly verbose.

There's a really long (and very good) explanation of how sample_posterior_predictive works in the docstring that Ricardo wrote up

daniel-saunders-phil · 2024-08-01T17:18:16Z

I am quite sure adding "rho" or not to var_names will have no effect as long as it is within the pm.do as there it should no longer be a volatile variable (I think that was the name) which are the variables that get resampled. Does it get printed in the logged output?

by logged output do you mean the Sampling: [ ] output that shows up under a cell, typically with a list of variable names? If so, no. It doesn't show up either way. So that's the trick to know which variables are volatile?

Otherwise, whether it shows up in the resulting inference data depends on if you include it as a var_name. When you include it, you just get a list of 1s.

OriolAbril · 2024-08-01T19:16:32Z

Yes and yes.

var_names is probably taking care of too many things currently. I think now that you have used it here and ran into these unexpected/somewhat surprising diferences you should go over to pymc-devs/pymc#7069 and coment whether that would have helped you/reduced confusion.

pseudo-nerd mentioned this issue Jul 31, 2024

Fix: Correct Expectation Calculation in BYM Model Predictions #681

Closed

daniel-s-tccc mentioned this issue Aug 1, 2024

fix posterior predictive computation #686

Merged

OriolAbril closed this as completed in #686 Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect computation in Posterior Predictive Checking section of BYM Notebook #675

Incorrect computation in Posterior Predictive Checking section of BYM Notebook #675

jessegrabowski commented Jul 26, 2024

ricardoV94 commented Jul 26, 2024

bwengals commented Aug 1, 2024

daniel-saunders-phil commented Aug 1, 2024

daniel-saunders-phil commented Aug 1, 2024

OriolAbril commented Aug 1, 2024 •

edited

Loading

jessegrabowski commented Aug 1, 2024

daniel-saunders-phil commented Aug 1, 2024

OriolAbril commented Aug 1, 2024

Incorrect computation in Posterior Predictive Checking section of BYM Notebook #675

Incorrect computation in Posterior Predictive Checking section of BYM Notebook #675

Comments

jessegrabowski commented Jul 26, 2024

Issue description

Expected output

Proposed solution

ricardoV94 commented Jul 26, 2024

bwengals commented Aug 1, 2024

daniel-saunders-phil commented Aug 1, 2024

daniel-saunders-phil commented Aug 1, 2024

OriolAbril commented Aug 1, 2024 • edited Loading

jessegrabowski commented Aug 1, 2024

daniel-saunders-phil commented Aug 1, 2024

OriolAbril commented Aug 1, 2024

OriolAbril commented Aug 1, 2024 •

edited

Loading