-
-
Notifications
You must be signed in to change notification settings - Fork 84
Rootograms to assess model fit of count data models #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hadn't seen those before, looks interesting. I think even for count On Sunday, September 4, 2016, Paul-Christian Bürkner <
|
I started playing around with this, but I'm going to stop working on it until after we get the first release out. I do think it will be a nice addition though. |
@paul-buerkner Any interest in trying to write this function? If not no worries. |
Yeah, I have actually some code lying around somewhere, but it is not yet "bayesplot ready". I will try to get this done in the near future :-) |
Ok cool thanks Paul! |
So the problem I was facing, were those "hanging" rootograms. "Suspended" and "standing" rootograms won't be much of an issue. Did you ever try to making histograms "hanging" from a line with ggplot? If yes, I would be glad for a pointer in the right direction ;-) |
Personally, I think hanging rootograms are a silly gimmick!
… On Dec 16, 2016, at 5:47 PM, Paul-Christian Bürkner ***@***.***> wrote:
So the problem I was facing, were those "hanging" rootograms. "Suspended" and "standing" rootograms won't be much of an issue. Did you ever try to making histograms "hanging" from a line with ggplot? If yes, I would be glad for a pointer in the right direction ;-)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#28 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADOBUTpzRiVyPgF2wbUugoPr-g5oEFJ7ks5rIxTxgaJpZM4J0hJp>.
|
But you think rootograms in general can make sense, right? |
Sure, I guess, but I'd be happier just framing as a residual plot. But I haven't thought too hard about these issues.
… On Dec 16, 2016, at 5:49 PM, Paul-Christian Bürkner ***@***.***> wrote:
But you think rootograms in general can make sense, right?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#28 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADOBUfSL46WW-gXa7Icy_cn8tLMnqjvQks5rIxWNgaJpZM4J0hJp>.
|
@andrewgelman Is the gimmick that it's "hanging" or do you not like rootograms? If the former we don't necessarily need to offer that. If the latter maybe there's a different type of plot we should be doing for count data? |
@andrewgelman Nevermind, I didn't see your second comment |
I don't know much about rootograms, but the square root scale is somewhat appealing. Could be useful because it adjusts to some extent for the fact that there can be very small and very large counts in the same outcome vector |
@paul-buerkner Maybe don't worry about "hanging" ones, but the rootogram is probably worth at least prototyping. At a minimum we'll be able to compare to residual plots and see if we find any cases where we can diagnose a problem using the rootogram better than we can using the residual plot. |
Sounds good. |
Here is an initial prototype function implementing rootograms. # TODO: document
ppc_rootogram <- function(y, yrep, style = c("standing", "suspended"), ...) {
require(ggplot2)
style <- match.arg(style)
# TODO: check validity of y and yrep
ymax <- max(y, yrep)
x <- 0L:ymax
# prepare a table for yrep
tyrep <- apply(yrep, 1, table)
for (i in seq_along(tyrep)) {
tyrep[[i]] <- as.numeric(tyrep[[i]][match(x, rownames(tyrep[[i]]))])
}
tyrep <- do.call(rbind, tyrep)
tyrep[is.na(tyrep)] <- 0
tyexp <- sqrt(colMeans(tyrep))
# prepare a table for y
ty <- table(y)
ty <- sqrt(as.numeric(ty[match(x, rownames(ty))]))
if (style == "suspended") {
ty <- ty - tyexp
}
ty[is.na(ty)] <- 0
# TODO: enforce bayesplot color schemes
ggplot(data.frame(x, tyexp, ty)) +
geom_histogram(aes_string("x", "ty"), stat = "identity",
fill = "lightblue", color = "blue") +
geom_line(aes_string("x", "tyexp"), size = 1.3) +
bayesplot::theme_default() +
xlab("y") +
ylab("sqrt(Frequency)")
} This should help us in discussing whether it is worth implementing rootograms in bayesplot. Attached are some figures generated with the above function using the data from the blog post and brms to fit the models. Interestingly, the plots shown in the blog post only display counts from 0 to 30, but the actually response variable goes over 100. I have tried it with the latest version of the countreg package from RForge and it is still the case. I don't know whether it is a bug or a feature but at least it appears strange to omit 2/3 of the actually response range. ppc_rootogram_standing_1.pdf |
Thanks @paul-buerkner. A few questions/comments
|
Thanks for adding the branch. Your changes look great! I think making the line dark is intuitive and much better than making a light line and a dark histogram even if is goes against consistency. I have added uncertainty intervals based on quantiles, but they look pretty bad for few predicted counts. Maybe we can display uncertainty in a way that is somewhat "smoother" than quantiles of counts. I have added color "m" for the intervals for now, but I think this is probably not the best idea... Shell I make a pull request so we can work on it together? (I don't have the permission to push to the branch directly) |
Are you interested in contributing code to bayesplot going forward (I guess
beyond just the rootogram function)? If so then I should see about getting
you push access.
…On Mon, Dec 19, 2016 at 5:06 PM, Paul-Christian Bürkner < ***@***.***> wrote:
Thanks for adding the branch. Your changes look great! I think making the
line dark is intuitive and much better than making a light line and a dark
histogram even if is goes against consistency.
I have added uncertainty intervals based on quantiles, but they look
pretty bad for few predicted counts. Maybe we can display uncertainty in a
way that is somewhat "smoother" than quantiles of counts. I have added
color "m" for the intervals for now, but I think this is probably not the
best idea...
Shell I make a pull request so we can work on it together? (I don't have
the permission to push to the branch directly)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHb4Q5UYQhaL2kMBD7Ur3I8FTr_fqezYks5rJv_QgaJpZM4J0hJp>
.
|
I am definitely interested. I will have to learn some of the inner structure of bayesplot to contribute to certain parts of the package, but that shouldn't be too difficult. |
I have also tried mean +- sd to visualize uncertainty, which is much smoother, but of course isn't very reasonable for counts. I think we should stick to quantiles, but we may want to explicitely document the option to turn them off completely by setting |
Hmm, yeah it's a bit tricky with counts. I'll take a look at what you
pushed to the branch.
…On Mon, Dec 19, 2016 at 7:00 PM, Paul-Christian Bürkner < ***@***.***> wrote:
I have also tried mean +- sd to visualize uncertainty, which is much
smoother, but of course isn't very reasonable for counts. I think we should
stick to quantiles, but we may want to explicitely document the option to
turn them off completely by setting prob = 0.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHb4Q44pS8QYZ9WiMLFytVYiE-bFvz9Rks5rJxqHgaJpZM4J0hJp>
.
|
I have now also added the "hanging" histogram gimmick just to see how it looks like. Feel free to remove it if you don't like it. |
@jgabry I think the rootogram branch is ready for a PR. Shell I open one, or do you want to have a look first? |
Cool. Go for it and then I'll take a look and make comments on the PR. |
closed by #62 |
I recently stumbled upon a blog post about rootograms (http://www.fromthebottomoftheheap.net/2016/06/07/rootograms/), which help in assessing model fit of count data models. Given that
ppc_dens_overlay
is not ideal in vizualizing discrete distributions, rootograms may be a nice alternative. What do you think?The text was updated successfully, but these errors were encountered: