-
-
Notifications
You must be signed in to change notification settings - Fork 84
Introduce mcmc_parcoord #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -55,6 +55,7 @@ export(mcmc_nuts_energy) | |||
export(mcmc_nuts_stepsize) | |||
export(mcmc_nuts_treedepth) | |||
export(mcmc_pairs) | |||
export(mcmc_parcoord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mcmc_parcoord_data function too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point. I'll add that. In the case that the user specifies the nuts
parameters, do you think it makes sense to return a single data frame that
has a column indicating if there was a divergence, or a list of two data
frames already split?
I guess that's also a general question for implementing the
_data
functions. It would be nice to always return a single data frame for all of
those functions (like what you already did for ppc_intervals/ribbon) but I
wonder if we'll run into cases going forward when that's not the best
option. An alternative would be to always return a list of data frames and
sometimes (usually) that list will only have a single data frame. What do
you think? Whatever we choose it would be good to do it the same way for
all of them.
When possible, return just a dataframe. If a plot requires multiple data-frames, return them in a list. Definitely agree that we should try to be as consistent as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point. I'll add that. In the case that the user specifies the nuts
parameters, do you think it makes sense to return a single data frame that
has a column indicating if there was a divergence, or a list of two data
frames already split?
With an indicator column.
Yes good point. I'll add that. In the case that the user specifies the nuts
parameters, do you think it makes sense to return a single data frame that
has a column indicating if there was a divergence, or a list of two data
frames already split?
I guess that's also a general question for implementing the `_data`
functions. It would be nice to always return a single data frame for all of
those functions (like what you already did for ppc_intervals/ribbon) but I
wonder if we'll run into cases going forward when that's not the best
option. An alternative would be to always return a list of data frames and
sometimes (usually) that list will only have a single data frame. What do
you think? Whatever we choose it would be good to do it the same way for
all of them.
…On Tue, Sep 5, 2017 at 3:38 PM TJ Mahr ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In NAMESPACE
<#108 (comment)>:
> @@ -55,6 +55,7 @@ export(mcmc_nuts_energy)
export(mcmc_nuts_stepsize)
export(mcmc_nuts_treedepth)
export(mcmc_pairs)
+export(mcmc_parcoord)
mcmc_parcoord_data function too?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#108 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHb4Q4OhODe4yV3AUquSBbJUIVOfboZKks5sfaMugaJpZM4PNQyO>
.
|
R/mcmc-parcoord.R
Outdated
|
||
has_divs <- !is.null(np) | ||
if (has_divs) { | ||
np <- validate_nuts_data_frame(np) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three points.
First, I recommend dplyr::pull(df, value)
or getElement(df, "value)
instead of .$value
. I think the package check will be unhappy with the .
symbol being an undefined global. If you use pull
, then the namespace needs to be updated too.
Second, speaking of undefined globals, Travis has timed out during checks on this branch. Can you confirm a clean package check?
Third, here's how the function might look in the new tidy evaluation system, you can use sym()
to create a symbol and then UQ()
to unquote the variable containing the symbol. (Typically you would write !! x
instead of UQ(x)
but it's best to avoid that shortcut in logical expressions.) Learn more about it from these guides.
if (has_divs) {
param <- sym("Parameter")
value <- sym("Value")
divg <- sym("Divergent")
draws$Divergent <- np %>%
filter(UQ(param) == "divergent__") %>%
pull(UQ(value)) %>%
rep(times = n_param)
np <- validate_nuts_data_frame(np)
div_draws <- filter(draws, UQ(divg) == 1)
draws <- filter(draws, UQ(divg) == 0)
}
This suggestion is not required. This is just a heads-up to check it out and try to get comfortable with it.
@tjmahr Thanks for the comments. Here's an update:
|
You could make an option for normalizing the data to mean 0 and sd 1 ( |
I think this can be done using the existing transformations argument that
all the mcmc_ plotting functions have. That is,
mcmc_parcoord(x, np = np, transformations = function(x) (x - mean(x))/sd(x))
I should add an example to the Examples section of the doc. Alternatively,
there could be an mcmc_parcoord_scaled function that does this
automatically. Might be convenient.
…On Wed, Sep 6, 2017 at 7:03 AM Ari Hartikainen ***@***.***> wrote:
You could make an option for normalizing the data to mean 0 and sd 1 (
(x-mu)/sigma). It is needed if the two parameters are in different scales
(e.g. normal(0,100), normal(150, 0.01))
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#108 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHb4Q_RlDbtJ9bENLGwC9SKooU_IybxMks5sfnvjgaJpZM4PNQyO>
.
|
[ci skip]
@ahartikainen I added an example of standardizing in 3279ebe. Thanks for the suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mcmc_melt is a great idea. So an "Iteration" is an iteration in a sampling chain. And a "Draw" is sample.
Yeah, well at least that makes sense to me! If we have a sample size of 4000 from 4 stacked chains of 1000 then it seems strange to refer the the 1001st value as the 1001st iteration since it was actually the 1st iteration in one of the chains before the chains were stacked. So yeah I think it makes sense to just use |
(Resolves #107)
This PR introduces the
mcmc_parcoord
function for parallel coordinates plots of MCMC draws. It can also highlight divergences in the plot if NUTS parameter info is passed to thenp
argument. The resulting plot is basically the plot suggested by @ahartikainen:http://discourse.mc-stan.org/t/concentration-of-divergences/1590/21
Example usage: