state_space_ed()
and state_space_ch()
allow now also irregular time series (before an error message was returned).state_space_ed()
includes now an argument na_rm = TRUE
(set as default) to deal with missing values, i.e. cases with NAs are dropped from the dataset prior to the analysis but the returned tibble includes these years (showing NA in the $ed
column).
Some minor fixes to pass all platform builds.
Fixed a minor bug in the NRMSE model prediction plot, where test observations where not shown in plot.
Changed the pressure sequence from length of the indicator time series to 100 in calc_deriv()
and the helper function approx_deriv()
and cond_boot()
to make smoother prediction plots generated by the plot_model()
function and its helper functions.
plot_trend()
slightly changed so that if model not available only obs plotted.
Individual panels in plot_diagnostics()
have now also the model formulation and model type as subtitle (similar to the multipanel plots in the $all_plots
column).
The multipanel plot in plot_model()
includes in the title also the model type in addition to the formula
A new data validation routine in the internal check_ind_press()
function (used in model_trend()
and ind_init()
) checks for unwanted characters in column names of indicator and pressure datasets, which caused errors when building the models. These characters are now removed or replaced by hyphens.
Some internal adjustments to changes in the newer R versions and packages (e.g. in all functions that use the nest() and unnest() functions from the tidyr package arguments were removed that are deprecated in the newer tidyr version).
The summary_sc()
function has a new 3rd output list, which shows all the pressure-independent scores and the pressure-specific scores for both sensitivity and robustness (i.e. the sum of C9 and C10 sub-criteria) as matrix. This table now serves as bases for some score-based IND performance functions (i.e. dist_sc()
, plot_spiechart()
).
The dist_sc()
takes now as input the new sub$scores_matrix
from the summary_sc()
function (instead of the output tibble from the scoring()
function).
NRMSE computation in model_gam()
and model_gamm()
is now based on the standard deviation instead of the mean as before. This has consequences for the overall scale of the NRMSE, hence, the cut-off values for the scoring were adjusted in the criteria score template (crit_scores_tmpl
): from > 0.4 (score 0), > 0.1 (score 1) and <= 0.1 (score 2) to > 2 (score 0), > 1 (score 1) and <= 1 (score 2).
The actual function for computing the NRMSE is now available as standalone function nrmse()
; the function allows 4 different types of normalization and has as additional arguments for the specification of the type of transformation applied to the observations prior to the analysis. If the transformation is specified the function computes the NRMSE on the back-transformed observations and predictions, which is recommended for indicator cross-comparisons (see also https://marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/).
The internal calc_nrmse()
has been rewritten so that it is a wrapper function of nrmse()
. It not only serves as internal helper function for model_gam()
and model_gamm()
now, but can be used by the user to compute the NRMSE for all models using different settings than the default (i.e. using a different normalization method and allow partial back-transformations). The function takes as input the model list (e.g. $model
in the final model tibble), a list of indicator values (e.g. the $ind_test
vectors from the ind_init()
function) and a list of pressure values (e.g. the $press_test
vectors) to calculate first the predicted values given the model and pressure values, then -if specified- the back-transformation and finally the NRMSE for the individual models.
All example data has been updated and include now the NRMSE based on the standard deviation and back-transformation if indicator time series were log-transformed.
The function dist_sc_group()
was added, which allows the calculation of the distance matrix averaged across groups, hence, it is like a weighted distance matrix.
All functions incorporate now the tidy evaluation principles to account for the recent updates of dplyr, ggplot and all other tidyverse packages, i.e. * all deprecated SE versions of the main tidyverse verbs have been replaced with the main verb and using !!rlang::sym()
, to create symbols from the variables provided as strings and unquote them directly in the capturing functions (see https://github.com/r-lib/rlang/issues/116). * aesthetic mappings in internal ggplot functions were based on individual vectors (by setting data = NULL
) in previous function. In the updated version aesthetic variables are provided in a data frame explicitly defined in the data
argument and referred to using !!rlang::sym()
.
With the upcoming release of ggplot2 v2.3.0 we deactivated our visual tests to avoid conflicts between generated and references plots that would cause tests to fail.
Minor modifications in the test files to pass all system checks on CRAN.
All functions now have data input validation routines that will return detailed messages if the required input has not the correct format. This prevents potential error messages when running following functions.
In all modeling functions potential error messages that occur as side effects in the model fitting are captured and printed out together with the model id, indicator and pressure variable or saved in the output tibble.
plot_spiecharts()
now orders the pressure-specific slices correctly to the pressure types.
All modeling functions can now handle all basic distribution families and some of the mgcv families.
expect_response()
now returns the modified input tibble with the correct column names.
In model_gamm()
the length of the outlier list to exclude (excl_outlier argument) is now correctly estimated in the data input validation routine.