Rootograms to assess model fit of count data models #28

paul-buerkner · 2016-09-04T11:52:30Z

I recently stumbled upon a blog post about rootograms (http://www.fromthebottomoftheheap.net/2016/06/07/rootograms/), which help in assessing model fit of count data models. Given that ppc_dens_overlay is not ideal in vizualizing discrete distributions, rootograms may be a nice alternative. What do you think?

The text was updated successfully, but these errors were encountered:

jgabry · 2016-09-04T19:03:15Z

Hadn't seen those before, looks interesting. I think even for count
models some combination of the available PPC plots could probably diagnose
the same type of misfit, but rootograms seem like a more convenient way to
do it. So yeah, probably a good addition to the package.

On Sunday, September 4, 2016, Paul-Christian Bürkner <
[email protected]> wrote:

I recently stumbled upon a blog post about rootograms (http://www.
fromthebottomoftheheap.net/2016/06/07/rootograms/), which help in
assessing model fit of count data models. Given that ppc_dens_overlay is
not ideal in vizualizing discrete distributions, rootograms may be a nice
alternative. What do you think?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#28, or mute the thread
https://github.com/notifications/unsubscribe-auth/AHb4Q0Xcy82jGdkJbqLX7DYrfHNaxgiEks5qmrD-gaJpZM4J0hJp
.

jgabry · 2016-09-06T16:29:37Z

I started playing around with this, but I'm going to stop working on it until after we get the first release out. I do think it will be a nice addition though.

jgabry · 2016-12-16T18:56:48Z

@paul-buerkner Any interest in trying to write this function? If not no worries.

paul-buerkner · 2016-12-16T21:07:50Z

Yeah, I have actually some code lying around somewhere, but it is not yet "bayesplot ready". I will try to get this done in the near future :-)

jgabry · 2016-12-16T21:38:45Z

Ok cool thanks Paul!

paul-buerkner · 2016-12-16T22:47:13Z

So the problem I was facing, were those "hanging" rootograms. "Suspended" and "standing" rootograms won't be much of an issue. Did you ever try to making histograms "hanging" from a line with ggplot? If yes, I would be glad for a pointer in the right direction ;-)

andrewgelman · 2016-12-16T22:48:31Z

Personally, I think hanging rootograms are a silly gimmick!

…

On Dec 16, 2016, at 5:47 PM, Paul-Christian Bürkner ***@***.***> wrote: So the problem I was facing, were those "hanging" rootograms. "Suspended" and "standing" rootograms won't be much of an issue. Did you ever try to making histograms "hanging" from a line with ggplot? If yes, I would be glad for a pointer in the right direction ;-) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADOBUTpzRiVyPgF2wbUugoPr-g5oEFJ7ks5rIxTxgaJpZM4J0hJp>.

paul-buerkner · 2016-12-16T22:49:49Z

But you think rootograms in general can make sense, right?

andrewgelman · 2016-12-16T22:50:41Z

Sure, I guess, but I'd be happier just framing as a residual plot. But I haven't thought too hard about these issues.

…

On Dec 16, 2016, at 5:49 PM, Paul-Christian Bürkner ***@***.***> wrote: But you think rootograms in general can make sense, right? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADOBUfSL46WW-gXa7Icy_cn8tLMnqjvQks5rIxWNgaJpZM4J0hJp>.

jgabry · 2016-12-16T23:01:35Z

@andrewgelman Is the gimmick that it's "hanging" or do you not like rootograms? If the former we don't necessarily need to offer that. If the latter maybe there's a different type of plot we should be doing for count data?

jgabry · 2016-12-16T23:02:14Z

@andrewgelman Nevermind, I didn't see your second comment

jgabry · 2016-12-16T23:12:30Z

I don't know much about rootograms, but the square root scale is somewhat appealing. Could be useful because it adjusts to some extent for the fact that there can be very small and very large counts in the same outcome vector y in some cases (and thus both very small and very large deviations between y and yrep that need to be visualized in the same plot).

jgabry · 2016-12-16T23:17:22Z

@paul-buerkner Maybe don't worry about "hanging" ones, but the rootogram is probably worth at least prototyping. At a minimum we'll be able to compare to residual plots and see if we find any cases where we can diagnose a problem using the rootogram better than we can using the residual plot.

paul-buerkner · 2016-12-16T23:21:56Z

Sounds good.

paul-buerkner · 2016-12-17T20:18:04Z

Here is an initial prototype function implementing rootograms.

# TODO: document
ppc_rootogram <- function(y, yrep, style = c("standing", "suspended"), ...) {
  require(ggplot2)
  style <- match.arg(style)
  # TODO: check validity of y and yrep
  ymax <- max(y, yrep)
  x <- 0L:ymax
  
  # prepare a table for yrep
  tyrep <- apply(yrep, 1, table)
  for (i in seq_along(tyrep)) {
    tyrep[[i]] <- as.numeric(tyrep[[i]][match(x, rownames(tyrep[[i]]))])
  }
  tyrep <- do.call(rbind, tyrep)
  tyrep[is.na(tyrep)] <- 0
  tyexp <- sqrt(colMeans(tyrep))
  
  # prepare a table for y
  ty <- table(y)
  ty <- sqrt(as.numeric(ty[match(x, rownames(ty))]))
  if (style == "suspended") {
    ty <- ty - tyexp
  }
  ty[is.na(ty)] <- 0
  
  # TODO: enforce bayesplot color schemes
  ggplot(data.frame(x, tyexp, ty)) + 
    geom_histogram(aes_string("x", "ty"), stat = "identity",
                   fill = "lightblue", color = "blue") +
    geom_line(aes_string("x", "tyexp"), size = 1.3) +
    bayesplot::theme_default() +
    xlab("y") +
    ylab("sqrt(Frequency)")
}

This should help us in discussing whether it is worth implementing rootograms in bayesplot. Attached are some figures generated with the above function using the data from the blog post and brms to fit the models.

Interestingly, the plots shown in the blog post only display counts from 0 to 30, but the actually response variable goes over 100. I have tried it with the latest version of the countreg package from RForge and it is still the case. I don't know whether it is a bug or a feature but at least it appears strange to omit 2/3 of the actually response range.

ppc_rootogram_standing_1.pdf
ppc_rootogram_standing_2.pdf
ppc_rootogram_suspended_1.pdf

jgabry · 2016-12-19T19:18:19Z

Thanks @paul-buerkner. A few questions/comments

Can you show uncertainty in the plot in some way?
I started a branch feature/ppc_rootogram and put your function there with a few edits (bayesplot colors, verifying y and yrep, etc). So feel free to work off that branch going forward (assuming you like the edits).
I used light colors for the bars and dark for the line, which seemed to be most natural. But typically for the PPC plots I use light for yrep and dark for y, so this goes against that tendency. But maybe that will be too confusing?

paul-buerkner · 2016-12-19T22:06:07Z

Thanks for adding the branch. Your changes look great! I think making the line dark is intuitive and much better than making a light line and a dark histogram even if is goes against consistency.

I have added uncertainty intervals based on quantiles, but they look pretty bad for few predicted counts. Maybe we can display uncertainty in a way that is somewhat "smoother" than quantiles of counts. I have added color "m" for the intervals for now, but I think this is probably not the best idea...

Shell I make a pull request so we can work on it together? (I don't have the permission to push to the branch directly)

jgabry · 2016-12-19T23:03:59Z

Are you interested in contributing code to bayesplot going forward (I guess beyond just the rootogram function)? If so then I should see about getting you push access.

…

On Mon, Dec 19, 2016 at 5:06 PM, Paul-Christian Bürkner < ***@***.***> wrote: Thanks for adding the branch. Your changes look great! I think making the line dark is intuitive and much better than making a light line and a dark histogram even if is goes against consistency. I have added uncertainty intervals based on quantiles, but they look pretty bad for few predicted counts. Maybe we can display uncertainty in a way that is somewhat "smoother" than quantiles of counts. I have added color "m" for the intervals for now, but I think this is probably not the best idea... Shell I make a pull request so we can work on it together? (I don't have the permission to push to the branch directly) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHb4Q5UYQhaL2kMBD7Ur3I8FTr_fqezYks5rJv_QgaJpZM4J0hJp> .

paul-buerkner · 2016-12-19T23:20:34Z

I am definitely interested. I will have to learn some of the inner structure of bayesplot to contribute to certain parts of the package, but that shouldn't be too difficult.

paul-buerkner · 2016-12-20T00:00:07Z

I have also tried mean +- sd to visualize uncertainty, which is much smoother, but of course isn't very reasonable for counts. I think we should stick to quantiles, but we may want to explicitely document the option to turn them off completely by setting prob = 0.

jgabry · 2016-12-20T00:13:52Z

Hmm, yeah it's a bit tricky with counts. I'll take a look at what you pushed to the branch.

…

On Mon, Dec 19, 2016 at 7:00 PM, Paul-Christian Bürkner < ***@***.***> wrote: I have also tried mean +- sd to visualize uncertainty, which is much smoother, but of course isn't very reasonable for counts. I think we should stick to quantiles, but we may want to explicitely document the option to turn them off completely by setting prob = 0. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHb4Q44pS8QYZ9WiMLFytVYiE-bFvz9Rks5rJxqHgaJpZM4J0hJp> .

paul-buerkner · 2016-12-20T21:47:48Z

I have now also added the "hanging" histogram gimmick just to see how it looks like. Feel free to remove it if you don't like it.

paul-buerkner · 2017-01-02T17:14:53Z

@jgabry I think the rootogram branch is ready for a PR. Shell I open one, or do you want to have a look first?

jgabry · 2017-01-02T18:39:32Z

Cool. Go for it and then I'll take a look and make comments on the PR.

jgabry · 2017-01-10T23:06:06Z

closed by #62

jgabry added the feature label Sep 5, 2016

jgabry closed this as completed Jan 10, 2017

jgabry added the new plot label Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rootograms to assess model fit of count data models #28

Rootograms to assess model fit of count data models #28

paul-buerkner commented Sep 4, 2016

jgabry commented Sep 4, 2016

jgabry commented Sep 6, 2016

jgabry commented Dec 16, 2016

paul-buerkner commented Dec 16, 2016

jgabry commented Dec 16, 2016

paul-buerkner commented Dec 16, 2016

andrewgelman commented Dec 16, 2016 via email

paul-buerkner commented Dec 16, 2016

andrewgelman commented Dec 16, 2016 via email

jgabry commented Dec 16, 2016

jgabry commented Dec 16, 2016 •

edited

Loading

jgabry commented Dec 16, 2016 •

edited

Loading

jgabry commented Dec 16, 2016

paul-buerkner commented Dec 16, 2016

paul-buerkner commented Dec 17, 2016

jgabry commented Dec 19, 2016

paul-buerkner commented Dec 19, 2016

jgabry commented Dec 19, 2016 via email

paul-buerkner commented Dec 19, 2016

paul-buerkner commented Dec 20, 2016

jgabry commented Dec 20, 2016 via email

paul-buerkner commented Dec 20, 2016

paul-buerkner commented Jan 2, 2017

jgabry commented Jan 2, 2017

jgabry commented Jan 10, 2017

Rootograms to assess model fit of count data models #28

Rootograms to assess model fit of count data models #28

Comments

paul-buerkner commented Sep 4, 2016

jgabry commented Sep 4, 2016

jgabry commented Sep 6, 2016

jgabry commented Dec 16, 2016

paul-buerkner commented Dec 16, 2016

jgabry commented Dec 16, 2016

paul-buerkner commented Dec 16, 2016

andrewgelman commented Dec 16, 2016 via email

paul-buerkner commented Dec 16, 2016

andrewgelman commented Dec 16, 2016 via email

jgabry commented Dec 16, 2016

jgabry commented Dec 16, 2016 • edited Loading

jgabry commented Dec 16, 2016 • edited Loading

jgabry commented Dec 16, 2016

paul-buerkner commented Dec 16, 2016

paul-buerkner commented Dec 17, 2016

jgabry commented Dec 19, 2016

paul-buerkner commented Dec 19, 2016

jgabry commented Dec 19, 2016 via email

paul-buerkner commented Dec 19, 2016

paul-buerkner commented Dec 20, 2016

jgabry commented Dec 20, 2016 via email

paul-buerkner commented Dec 20, 2016

paul-buerkner commented Jan 2, 2017

jgabry commented Jan 2, 2017

jgabry commented Jan 10, 2017

jgabry commented Dec 16, 2016 •

edited

Loading

jgabry commented Dec 16, 2016 •

edited

Loading