Modeling of indicator trends

The function models the long-term trend of each indicator (IND) based on Generalized Additive Models (GAM) and returns a tibble with IND-specific GAM outputs.

model_trend(
  ind_tbl,
  time,
  train = 1,
  random = FALSE,
  k = 4,
  family = stats::gaussian()
)

Arguments

ind_tbl	A data frame, matrix or tibble containing only the (numeric) IND variables. Single indicators should be coerced into a data frame to keep the indicator name. If kept as vector, default name will be `ind`.
time	A vector containing the actual time steps (e.g. years; should be the same for the IND data).
train	The proportion of observations that should go into the training data on which the GAMs are fitted. Has to be a numeric value between 0 and 1; the default is 1 (i.e. the full time series is fitted).
random	logical; should the observations for the training data be randomly chosen? Default is FALSE.
k	Choice of knots (for the smoothing function `s`); the default is 4.
family	A description of the error distribution and link to be used in the GAM. This needs to be defined as a family function (see also `family`). All standard family functions can be used as well some of the distribution families in the mgcv package (see `family.mgcv`; e.g.`negbin` or `nb`).

Value

The function returns a tibble, which is a trimmed down version of the data.frame(), including the following elements:

ind_id: Indicator IDs.
ind: Indicator names. These might be modified to exclude any character, which is not in the model formula (e.g. hyphens, brackets, etc. are replaced by an underscore, variables starting with a number will get an x before the number.
p_val: The p values for the smoothing term (here time).
model: A list-column of indicator-specific gam objects.
ind_train: A list-column with indicator values of the training data.
time_train: A list-column with the time values (e.g. years) of the training data.
pred: A list-column with indicator values predicted from the GAM for the training period.
ci_up: A list-column with the upper 95% confidence interval of predicted indicator values.
ci_low: A list-column with the lower 95% confidence interval of predicted indicator values.

Details

To test for linear or non-linear long-term changes, each indicator (IND) in the ind_tbl is modeled as a smoothing function of the time vector (usually years) using the gam function. The trend can be tested for the full time series (i.e. all observations are used as training data) or for a random or selected subset.

The GAMs are build using the default settings in the gam function and the smooth term function s). However, the user can adjust the distribution and link by modifying the family argument as well as the maximum level of non-linearity by setting the number of knots:

gam(ind ~ s(time, k = k), family = family, data = training_data)

Examples

# Using the Baltic Sea demo data in this package
ind_tbl <- ind_ex[ ,-1] # excluding the year
time <- ind_ex$Year
# Using the default settings
trend_tbl <- model_trend(ind_tbl, time)
# Change the training and test data assignment
model_trend(ind_tbl, time, train = .5, random = TRUE)
#> # A tibble: 12 × 9
#>    ind_id ind        p_val model  ind_train  time_train pred       ci_up  ci_low
#>     <int> <chr>      <dbl> <list> <list>     <list>     <list>     <list> <list>
#>  1      1 TZA     0.0633   <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#>  2      2 MS      0.442    <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#>  3      3 rCC     0.836    <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#>  4      4 Cops    0.791    <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#>  5      5 Micro   0.617    <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#>  6      6 rZPPP   0.391    <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#>  7      7 Sprat   0.152    <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#>  8      8 Herring 0.419    <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#>  9      9 Stickle 0.000209 <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#> 10     10 Cod     0.0302   <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#> 11     11 SPF     0.0477   <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
#> 12     12 LPF     0.0419   <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl … <dbl …
# To keep the name when testing only one indicator, coerce vector to data frame
model_trend(data.frame(MS = ind_tbl$MS), time, train = .5, random = TRUE)
#> # A tibble: 1 × 9
#>   ind_id ind   p_val model  ind_train  time_train pred       ci_up      ci_low  
#>    <int> <chr> <dbl> <list> <list>     <list>     <list>     <list>     <list>  
#> 1      1 MS    0.602 <gam>  <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> <dbl [1…

Arguments

Value

Details

See also

Examples