Skip to content

Incorrect computation in Posterior Predictive Checking section of BYM Notebook #675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jessegrabowski opened this issue Jul 26, 2024 · 8 comments · Fixed by #686
Closed

Comments

@jessegrabowski
Copy link
Member

Notebook title: The Besag-York-Mollie Model for Spatial Data
Notebook url:https://www.pymc.io/projects/examples/en/latest/spatial/nyc_bym.html

Issue description

I believe this computation is invalid:

phi_pred = idata.posterior.phi.mean(("chain", "draw")).values
beta0_pred = idata.posterior.beta0.mean(("chain", "draw")).values
sigma_pred = idata.posterior.sigma.mean(("chain", "draw")).values
y_predict = np.exp(log_E + beta0_pred + sigma_pred * (1 / scaling_factor) * phi_pred)

Expected output

This code computes f(E[x]), when the mean prediction conditioned on rho = 1. The correct quantity to compute, however, is E[f(x)]. That is, the expectation should be taken last, only after computing the exp of the samples. The result is biased downward (since Jensen's inequality says E[f(x)] < f(E[x]) )

Proposed solution

This is actually a really great place to use pm.do. I propose:

with pm.do(BYM_model, {'rho':1.0}):
    y_predict_rho_1 = pm.sample_posterior_predictive(idata, var_names=['rho', 'mixture', 'mu'], predictions=True, extend_inferencedata=False)
y_predict = y_predict_rho_1.predictions.mu.mean(dim=['chain', 'draw'])

This would require wrapping mixture and mu in pm.Deterministic in the model code as well.

@ricardoV94
Copy link
Member

Sounds good

@bwengals
Copy link
Collaborator

bwengals commented Aug 1, 2024

Tagging @daniel-saunders-phil

@daniel-saunders-phil
Copy link
Contributor

Ah that makes sense! Thanks for noticing that.

@daniel-saunders-phil
Copy link
Contributor

@jessegrabowski just to help me understand the behavior of pm.sample_posterior_predictive(), does it make a difference if we pass rho into the var_names? It seems to yield equivalent outcomes if we just pass 'mu','mixture'.

@OriolAbril
Copy link
Member

OriolAbril commented Aug 1, 2024

I am quite sure adding "rho" or not to var_names will have no effect as long as it is within the pm.do as there it should no longer be a volatile variable (I think that was the name) which are the variables that get resampled. Does it get printed in the logged output?

@jessegrabowski
Copy link
Member Author

No, it shouldn't, because after you do pm.do, rho isn't random anymore. I think I put it in because I was paranoid and overly verbose.

There's a really long (and very good) explanation of how sample_posterior_predictive works in the docstring that Ricardo wrote up

@daniel-saunders-phil
Copy link
Contributor

I am quite sure adding "rho" or not to var_names will have no effect as long as it is within the pm.do as there it should no longer be a volatile variable (I think that was the name) which are the variables that get resampled. Does it get printed in the logged output?

by logged output do you mean the Sampling: [ ] output that shows up under a cell, typically with a list of variable names? If so, no. It doesn't show up either way. So that's the trick to know which variables are volatile?

Otherwise, whether it shows up in the resulting inference data depends on if you include it as a var_name. When you include it, you just get a list of 1s.

@OriolAbril
Copy link
Member

Yes and yes.

var_names is probably taking care of too many things currently. I think now that you have used it here and ran into these unexpected/somewhat surprising diferences you should go over to pymc-devs/pymc#7069 and coment whether that would have helped you/reduced confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants