The tidyhydro package provides a set of commonly used metrics in
hydrology (such as NSE, KGE, pBIAS) for use within a
tidymodels infrastructure. Originally
inspired by the
yardstick and
hydroGOF packages, this
library is mainly written in C++ and provides a very quick estimation of
desired goodness-of-fit criteria.
Additionally, you’ll find here a C++ implementation of lesser-known yet powerful metrics and descriptive statistics recommended in the United States Geological Survey (USGS) and the New Zealand National Environmental Monitoring Standards (NEMS) guidelines. Examples include PRESS (Prediction Error Sum of Squares), SFE (Standard Factorial Error), MSPE (Model Standard Percentage Error) and others. Based on the equations from Helsel et al. (2020), Rasmunsen et al. (2008), Hicks et al. (2020) and etc. (see documentation for details).
The tidyhydro package follows the philosophy of
yardstick and
provides S3 class methods for vectors and data frames. For example, one
can estimate KGE, NSE or pBIAS for a data frame like this:
library(tidyhydro)
str(avacha)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 365 obs. of 3 variables:
#> $ date: Date, format: "2022-01-01" "2022-01-02" ...
#> $ obs : num 76.2 76.2 76.3 76.3 76.4 76.4 76.5 76.5 76.6 76.6 ...
#> $ sim : num 84.8 84.3 84 83.7 83.4 ...
kge(avacha, obs, sim)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 kge standard 0.947or create a
metric_set
and estimate several parameters at once like this:
hydro_metrics <- yardstick::metric_set(nse, pbias)
hydro_metrics(avacha, obs, sim)
#> # A tibble: 2 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 nse standard 0.895
#> 2 pbias standard 0.0540We do understand that sometimes one needs a qualitative interpretation
of the model. Therefore, we populated some functions with a
performance argument. When performance = TRUE, the metric
interpretation will be returned according to Moriasi et
al. (2015).
hydro_metrics(avacha, obs, sim, performance = TRUE)
#> # A tibble: 2 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <chr>
#> 1 nse standard Excellent
#> 2 pbias standard ExcellentIn addition to metric, inherited from yardstick, the tidyhydro
introduces the measure objects. It aims to calculate descriptive
statistics of a single dataset, such as cv() — coefficient of
variation (a measure of variability) or gm() — geometric mean (a
measure of central tendency):
# Coefficient of Variation
cv(avacha, obs)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 cv standard 0.533
# Geometric mean
gm_vec(avacha$obs)
#> [1] 128.9476Similarly to metric_set, one can create a measure_set and estimate
desired descriptive statistics at once:
ms <- measure_set(cv, gm)
ms(avacha, obs)
#> # A tibble: 2 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 cv standard 0.533
#> 2 gm standard 129.You can install the development version of tidyhydro from
GitHub with:
# install.packages("pak")
pak::pak("atsyplenkov/tidyhydro")Since the package uses Rcpp in the background, it performs slightly
faster than base R and other R packages (see
benchmarks).
This is particularly noticeable with large datasets:
set.seed(12234)
x <- runif(10^6)
y <- runif(10^6)
nse <- function(truth, estimate, na_rm = TRUE) {
#fmt: skip
1 - (sum((truth - estimate)^2, na.rm = na_rm) /
sum((truth - mean(truth, na.rm = na_rm))^2, na.rm = na_rm))
}
bench::mark(
tidyhydro = tidyhydro::nse_vec(truth = x, estimate = y),
hydroGOF = hydroGOF::NSE(sim = y, obs = x),
baseR = nse(truth = x, estimate = y),
check = TRUE,
relative = TRUE,
filter_gc = FALSE,
iterations = 50L
)
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 tidyhydro 1 1 18.2 NaN NaN
#> 2 hydroGOF 11.7 11.6 1 Inf Inf
#> 3 baseR 7.19 8.63 1.98 Inf InfPlease note that the tidyhydro project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.