R/calc_deriv.R
calc_deriv.Rd
In the case of non-linear IND responses to pressures first derivatives
of the response curve are calculated to identify whether the IND ceases
to respond at the lower or upper end of the pressure range (relevant for
sub-crit. 9.2). calc_deriv
serves as a wrapper function that filters
first the model input tibble and applies as default method the `conditional
bootstrap` (see the function cond_boot
). It calculates from
the computed bootstrapped derivatives and confidence intervals the
proportion of pressure range in which the IND shows a significant change.
The implementation of the `conditional bootstrap` for Generalized Additive
Mixed Models (GAMMs) has some caveats (see Details), which is why
we implemented additionally a rather quick-and-dirty approach (through the
approx_deriv
function). In the `approx_deriv` method derivatives
are calculated directly from the smoothing curve of the original method. The
confidence intervals are then just approximated from the standard errors of the
smoothing curve. For more informations on this method see
approx_deriv
.
calc_deriv( init_tbl, mod_tbl, edf_filter = 1.5, sign_level = 0.05, excl_outlier = FALSE, method = "cond_boot", n_boot = 200, ci_boot = 0.95, ci_prop_se = 25, par_comp = FALSE, no_clust = NULL, seed = NULL )
init_tbl | The output tibble of the |
---|---|
mod_tbl | A model output tibble from |
edf_filter | The minimum edf value at which derivatives are calculated for the respective model. The default is set to 1.5. |
sign_level | Significance level for selecting models on which to calculate the derivatives. Only models with a p value smaller than the sign_level will be selected; the default is 0.05. |
excl_outlier | logical; if TRUE, the outliers excluded in the original models will be also excluded in the bootstrapped models. |
method | Method for calculating derivatives and CI; can be either `conditional bootstrap` (default) or `approx_deriv`. |
n_boot | Number of bootstraps. Select n_boot so that (n_boot - (n_boot *ci)) / 2 will be an integer. Otherwise, the function will increase n_boot automatically. The default is set to 200. |
ci_boot | Confidence interval of the bootstrapped smoothing functions and their derivatives. Must be between 0 and 1, default is 0.95. |
ci_prop_se | A conversion factor for approximating derivative CIs in the `approx_deriv` method; it is multiplied with the ratio between s.e. and mean fitted values of the smoothing curve to represent some level of uncertainty around the slope proportional to the uncertainty in the smoothing curve. Default is 25, which is a compromise representing fairly well the results obtained for the GAMs from the conditional bootstrap. |
par_comp | logical; if TRUE, the conditional bootstrap will be processed in parallel using several clusters, which can speed up the iteration process depending on the number of n_boot, models to bootstrap and number of processor cores. |
no_clust | Number of clusters ("workers") for the parallel computation, with one cluster per core. If no_clust is set to NULL default, the number of clusters is set as the numbers of available cores – 1. |
seed | A single value, interpreted as an integer, which specifies the seed of the random number generator (RNG) state for reproducibility. Due to the work splitting in the parallel computation, RNG streams are not comparable with the stream under serial computation. To reproduce results use the same type of computation with the same seed and number of clusters. |
The function returns the input model tibble with the following 12 columns added
prop
The proportion of the observed pressure range where the IND indicator shows a response (see the last section in Details). Significant models with edf < edf_filter get as default a value of 1.0 (i.e., the IND responses to the entire observed pressure range).
zero_in_conf
A list-column of logical vectors indicating for every pressure value (in press_seq) whether the slope of the IND response at that pressure value is within the confidence interval, i.e. is zero.
zic_start_end
A list-column of logical vectors indicating for every pressure value (in press_seq) whether the slope is considered as zero for the proportion calculation (see the last section in Details).
press_seq
A list-column with sequences of 100 evenly spaced pressure values.
pred
A list-column with the predicted indicator responses averaged across all bootstraps (for the 100 equally spaced pressure values).
pred_ci_up
A list-column with the upper confidence limit of the bootstrapped predictions.
pred_ci_low
A list-column with the lower confidence limit of the bootstrapped predictions.
deriv1
A list-column with the first derivatives of the indicator responses averaged across all bootstraps (for the 100 equally spaced pressure values).
deriv1_ci_up
A list-column with the upper confidence limit of the bootstrapped first derivatives.
deriv1_ci_low
A list-column with the lower confidence limit of the bootstrapped first derivatives.
adj_n_boot
The number of successful bootstrap samples that was actually used for calculating the mean and confidence intervals of the predicted indicator response and the derivative.
boot_error
A list-column capturing potential error messages that occurred as side effects when refitting the GAM(M)s on each bootstrap sample.
If none of the significant models has edf > edf_filter, only the variable prop
will be added. If the `approx_deriv` method was used, the output tibble will
not contain the pred
, pred_ci_up
, and pred_ci_low
variables.
In the case of non-linear IND responses to pressures first derivatives of the response curve are calculated to identify whether the IND ceases to respond at the lower or upper end of the pressure range (relevant for sub-crit. 9.2).
First Derivative:
The first derivative of a smoothing function i.e. s`(x),
represents the instantaneous rate of change of a dependent variable y to
that of the independent variable x. It tells us whether a function is
increasing or decreasing, and by how much. This information is reflected by
the slope of the tangent line to a point x on the graph of a function.
A positive slope tells us that, as x increases, s(x) also increases.
Derivatives can be applied to any function, even one as potentially
complex as a fitted spline function, by using the method of finite
differences (Trenkel and Rochet, 2009) as done in the cond_boot
function: The local first derivatives of the fitted GAM(M) time series
is estimated as the difference between fitted values at time step t and
t + d divided by d, where d is a small value, e.g. 1E-7.
Confidence intervals (CI) based on a conditional bootstrap:
By calculating approximate confidence intervals for s`(x), it may
be determined whether or not apparent IND changes are statistically
significant, i.e. whether the non-zero estimate obtained for
the slope is distinguishable from zero given the uncertainty in the
estimate (Fewster et al., 2000). To estimate the CI of
the estimated first derivative, a conditional bootstrap is
carried out by resampling from the GAM residuals (in the
cond_boot
function) (Large et al., 2013).
This approach has the advantage to retain the information in the
pressure variable and to be distribution-free.
In specific, new IND time series are created in cond_boot
by
resampling from the residuals of the original IND-Pressure GAM(M) and
adding these to the original IND time series repeatedly (the number of
repetitions is defined by n_boot).
A separate GAM(M) is then fitted to each bootstrap IND
time series in cond_boot
, using the same effective degrees
of freedom (edf) as found optimal for the original IND time series. For
each of the bootstrapped GAMs, predicted IND time series given an evenly
spaced pressure sequence are produced for which the first derivatives
are calculated. Confidence intervals are computed by sorting the
bootstrapped derivatives into ascending order and calculating the
upper and lower percentiles defined by the ci
argument (the
default is the 2.5% and 97.5% percentiles representing the 95% CI).
A problem with GAMMs is that the smoothing parameters cannot be fixed,
because the gamm
function will treat the wiggly
components of the smooth terms as random effects, the variance of which
to be estimated by lme. Hence, the GAMMs refitted on the bootstrapped IND
time series are allowed to have every shape from linear to the max. edfs
set originally. This leads to often positive as well as negative linear
smoothers, which increases the confidence intervals of both fitted
smoothers as well as derivative. When the IND response is weak and/or
the error around the smoother high, the implemented routine to estimate
the proportion of pressure range with significant slope can lead to 0%.
For those models, the method `approx_deriv` can be applied for comparison.
Calculating the proportion of pressure range:
The lack of any further IND response to pressure changes at
its minimum or maximum measured is noted when zero is contained
within the CI of the first derivative and quantified by calculating the
proportion of points (evenly distributed along the pressure axis)
inside the CI for scoring criteria 10.2. In specific,
calc_deriv
implements a routine, which identifies
first which pressure values has a slope of zero (see the returned
list-column zero_in_conf
). It then checks whether the first
value in zero_in_conf
is TRUE (i.e., the lowest value of the
evenly spaced pressure sequence has zero slope). If so, the first value
in the vector zic_start_end (see the returned list-column
zic_start_end
) is also set to TRUE, meaning it goes into
the proportion calculation as no IND response. The routine proceeds
to the next values zero_in_conf
and stops at the first FALSE value.
It then starts the same procedure from end of the zero_in_conf
vector. The proportion of pressure range (returned as prop
variable in the output tibble) reflects the proportion of FALSE values
in zero_in_conf
.
Fewster, R.M., Buckland, S.T., Siriwardena, G.M., Baillie, S.R., Wilson, J.D. (2000) Analysis of population trends for farmland birds using generalized additive models. Ecology 81, 1970-1984.
Large, S.I., Fay, G., Friedland, K.D., Link, J.S. (2013) Defining trends and thresholds in responses of ecological indicators to fishing and environmental pressures. ICES Journal of Marine Science 70, 755-767.
Trenkel, V.M., Rochet, M.J. (2009) Intersection-union tests for characterizing recent changes in smoothed indicator time series. Ecological Indicators 9, 732-739.
# \donttest{ # Using some models of the Baltic Sea demo data init_tbl <- ind_init_ex[ind_init_ex$id %in% c(5,9,48,75), ] mod_tbl <- merge_models_ex[merge_models_ex$id %in% c(5,9,48,75), ] deriv_tbl <- calc_deriv(init_tbl = init_tbl, mod_tbl = mod_tbl, n_boot = 40, par_comp = FALSE, seed=1, method = "approx_deriv") # }