Skip to content

groupby keyword #396

Open
Open
@aaronspring

Description

@aaronspring

Code Sample

What do you think about a groupby keyword?

To create plots like showing the correlation skill for each initial month:
image

https://nbviewer.org/github/mktippett/NMME/blob/master/n34.ipynb

# correlation as function of start month
# need to compute the means over values where both are not missing
def ac_by_start(x, y):
    ok = ~np.isnan(x) & ~np.isnan(y)
    xa = x.where(ok).groupby('S.month') - x.where(ok).groupby('S.month').mean('S') 
    ya = y.where(ok).groupby('S.month') - y.where(ok).groupby('S.month').mean('S') 
    c = (xa*ya).groupby('S.month').mean('S')/xa.groupby('S.month').std('S')/ya.groupby('S.month').std('S')
    c.attrs['long_name'] = 'correlation'
    c.month.attrs['long_name'] = 'start month'
    # c = xr.corr(x.groupby('S.month'), y.groupby('S.month'), dim='S')
    return c

to be xs.pearson_r(a,b, dim="S", group="month")

Problem description

for many metrics like rmse this doesnt matter much, because you can calc a metric over no dimension dim=[] and hence you can groupby manually afterwards. However, especially for correlation metrics this cannot be done, as they require to be computed over a metric. Therefore I propose to add a metric keyword for correlation metrics as used by @mktippett in https://nbviewer.org/github/mktippett/NMME/blob/master/n34.ipynb

Applies to metrics:

  • pearson_r
  • spearman_r
  • others require dim!=[]

Alternatives

loop and subselect data before

corr = xr.concat([xs.corr(a.sel({dim:a[dim].dt.month == m}), b.sel({dim:a[dim].dt.month == m}), dim=dim) for m in months], "month")

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions