-
Notifications
You must be signed in to change notification settings - Fork 1
Join implementation #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Looking good so far. I think the happy path is solid. The edge case for groups is when there is a common column but it's not used for joining, in which case the result has a da <- starwars[, c("name", "eye_color", "height", "mass")][1:10, ]
da <- group_by(da, eye_color)
db <- starwars[, c("name", "eye_color", "homeworld")]
Da <- as(da, "DataFrame")
Da <- group_by(Da, eye_color)
Db <- as(db, "DataFrame")
# these all work, preserving group
left_join(da, db)
right_join(da, db)
inner_join(da, db[1:3, ])
# these all give the right answers, group preserved
left_join(Da, Db)
right_join(Da, Db)
inner_join(Da, Db[1:3, ])
# none of these retain the group because it's no longer a column, but they don't error
left_join(da, db, by = "name")
right_join(da, db, by = "name")
inner_join(da, db[1:3, ], by = "name")
# these currently fail because they try to group by a non-existent column
left_join(Da, Db, by = "name")
right_join(Da, Db, by = "name")
inner_join(Da, Db[1:3, ], by = "name") I think grps <- intersect(grps, colnames(x)) # <-- preserve remaining groups
# if no grouping return object
if(length(grps) == 0)
return(x)
# else rebuild groups with new DF
group_by(x, !!!rlang::syms(grps)) but please add some tests for this case (and confirm that it works). |
For the rownames issue, is_leftish <- function(...) {
# does this look like a non-right join?
args <- list(...)
if (utils::hasName(args, "all.y") && args$all.y) return(FALSE)
TRUE
}
join_internal <- function(x, y, by = NULL, ...) {
use_rownames <- is_leftish(...)
if (use_rownames) x$...rownames <- rownames(x)
if (is.null(by))
by <- check_common_columns(names(x), names(y))
grps <- group_vars(x)
x <- S4Vectors::merge(x, y, by = by, sort = FALSE, ...)
if (use_rownames) rownames(x) <- x$...rownames
if (use_rownames) x$...rownames <- NULL
grps <- intersect(grps, colnames(x))
# if no grouping return object
if(length(grps) == 0)
return(x)
# else rebuild groups with new DF
group_by(x, !!!rlang::syms(grps))
} Produces this: library(S4Vectors)
library(DFplyr)
da <- starwars[, c("name", "eye_color", "height", "mass")][1:10, ]
da <- group_by(da, eye_color)
db <- starwars[, c("name", "eye_color", "homeworld")]
Da <- as(da, "DataFrame")
Da <- group_by(Da, eye_color)
rownames(Da) <- paste0("row", 1:10)
Db <- as(db, "DataFrame")
left_join(Da, Db)
#> Joining with `by = c("name", "eye_color")`
#> DataFrame with 10 rows and 5 columns
#> Groups: eye_color
#> name eye_color height mass homeworld
#> <character> <character> <integer> <numeric> <character>
#> row1 Luke Skywalker blue 172 77 Tatooine
#> row2 C-3PO yellow 167 75 Tatooine
#> row3 R2-D2 red 96 32 Naboo
#> row4 Darth Vader yellow 202 136 Tatooine
#> row5 Leia Organa brown 150 49 Alderaan
#> row6 Owen Lars blue 178 120 Tatooine
#> row7 Beru Whitesun Lars blue 165 75 Tatooine
#> row8 R5-D4 red 97 32 Tatooine
#> row9 Biggs Darklighter brown 183 84 Tatooine
#> row10 Obi-Wan Kenobi blue-gray 182 77 Stewjon
right_join(Da, Db)
#> Joining with `by = c("name", "eye_color")`
#> DataFrame with 87 rows and 5 columns
#> Groups: eye_color
#> name eye_color height mass homeworld
#> <character> <character> <integer> <numeric> <character>
#> 1 Luke Skywalker blue 172 77 Tatooine
#> 2 C-3PO yellow 167 75 Tatooine
#> 3 R2-D2 red 96 32 Naboo
#> 4 Darth Vader yellow 202 136 Tatooine
#> 5 Leia Organa brown 150 49 Alderaan
#> ... ... ... ... ... ...
#> 83 BB8 black NA NA NA
#> 84 Captain Phasma unknown NA NA NA
#> 85 San Hill gold NA NA Muunilinst
#> 86 Shaak Ti black NA NA Shili
#> 87 Grievous green, yellow NA NA Kalee
inner_join(Da, Db[1:3, ])
#> Joining with `by = c("name", "eye_color")`
#> DataFrame with 3 rows and 5 columns
#> Groups: eye_color
#> name eye_color height mass homeworld
#> <character> <character> <integer> <numeric> <character>
#> row1 Luke Skywalker blue 172 77 Tatooine
#> row2 C-3PO yellow 167 75 Tatooine
#> row3 R2-D2 red 96 32 Naboo
full_join(Da, Db[1:3, ])
#> Joining with `by = c("name", "eye_color")`
#> DataFrame with 10 rows and 5 columns
#> Groups: eye_color
#> name eye_color height mass homeworld
#> <character> <character> <integer> <numeric> <character>
#> row1 Luke Skywalker blue 172 77 Tatooine
#> row2 C-3PO yellow 167 75 Tatooine
#> row3 R2-D2 red 96 32 Naboo
#> row5 Leia Organa brown 150 49 NA
#> row6 Owen Lars blue 178 120 NA
#> row7 Beru Whitesun Lars blue 165 75 NA
#> row4 Darth Vader yellow 202 136 NA
#> row9 Biggs Darklighter brown 183 84 NA
#> row10 Obi-Wan Kenobi blue-gray 182 77 NA
#> row8 R5-D4 red 97 32 NA This seems to work - for left, inner, and full joins the rownames are preserved (note the order change for |
Thanks for the feedback! all good points, I'll work on implementing this. I agree on One extra edge case I see regarding |
Just checking in - any progress? |
limited bandwidth right now, but I'll have more time in May |
No rush on my side - let's target the next Bioconductor release (October?). |
Following up on issue #13.
I modified how to deal with groups: extracting group variables before
merge
and rerunninggroup_by
aftermerge
. I'm not sure I follow your comment here:join
should not drop columns? The only edge case I can imagine is ia grouping column would being allNA
in case ofright_join
where there is no match, or maybe being constant if only one category matches.I dropped the
arrange
from the tests excepted forright_join
.One last thought I have is regarding
rownames
. I thinkdplyr
behavior behavior is to droprownames
withjoin
operations and right now that would be the same with this implementation. I think that's fine but wanted to point that out.