Open
Description
Code Sample
What do you think about a groupby
keyword?
To create plots like showing the correlation skill for each initial month:
https://nbviewer.org/github/mktippett/NMME/blob/master/n34.ipynb
# correlation as function of start month
# need to compute the means over values where both are not missing
def ac_by_start(x, y):
ok = ~np.isnan(x) & ~np.isnan(y)
xa = x.where(ok).groupby('S.month') - x.where(ok).groupby('S.month').mean('S')
ya = y.where(ok).groupby('S.month') - y.where(ok).groupby('S.month').mean('S')
c = (xa*ya).groupby('S.month').mean('S')/xa.groupby('S.month').std('S')/ya.groupby('S.month').std('S')
c.attrs['long_name'] = 'correlation'
c.month.attrs['long_name'] = 'start month'
# c = xr.corr(x.groupby('S.month'), y.groupby('S.month'), dim='S')
return c
to be xs.pearson_r(a,b, dim="S", group="month")
Problem description
for many metrics like rmse
this doesnt matter much, because you can calc a metric over no dimension dim=[]
and hence you can groupby
manually afterwards. However, especially for correlation metrics this cannot be done, as they require to be computed over a metric. Therefore I propose to add a metric keyword for correlation metrics as used by @mktippett in https://nbviewer.org/github/mktippett/NMME/blob/master/n34.ipynb
Applies to metrics:
pearson_r
spearman_r
- others require
dim!=[]
Alternatives
loop and subselect data before
corr = xr.concat([xs.corr(a.sel({dim:a[dim].dt.month == m}), b.sel({dim:a[dim].dt.month == m}), dim=dim) for m in months], "month")