model_gam applies Generalized Additive Models (GAMs) to each IND~pressure combination created in ind_init and returns a tibble with IND~pressure-specific GAM outputs.

model_gam(init_tbl, k = 5, family = stats::gaussian(), excl_outlier = NULL)

Arguments

init_tbl

The output tibble of the ind_init function.

k

Choice of knots (for the smoothing function s); the default is 5.

family

A description of the error distribution and link to be used in the GAM. This needs to be defined as a family function (see also family). All standard family functions can be used as well some of the distribution families in the mgcv package (see family.mgcv; e.g.negbin or nb).

excl_outlier

A list of values identified as outliers in specific IND~pressure GAMs, which should be excluded in this modeling step (the output tibble of this function includes the variable `pres_outlier`, which is a column-list containing all indices of values with cook`s distance > 1 (see below). The function can be re-run again, then excluding all these outliers provided in $pres_outlier from the the first run (see example)).

Value

The function returns a tibble, which is a trimmed down version of the data.frame(), including the following elements:

id

Numerical IDs for the IND~press combinations.

ind

Indicator names.

press

Pressure names.

model_type

Specification of the model type; at this stage containing only "gam" (Generalized Additive Model).

corrstruc

Specification of the correlation structure; at this stage containing only "none".

aic

AIC of the fitted models

edf

Estimated degrees of freedom for the model terms.

p_val

The p values for the smoothing term (the pressure).

signif_code

The significance codes for the p-values.

r_sq

The adjusted r-squared for the models. Defined as the proportion of variance explained, where original variance and residual variance are both estimated using unbiased estimators. This quantity can be negative if your model is worse than a one parameter constant model, and can be higher for the smaller of two nested models.

expl_dev

The proportion of the null deviance explained by the models.

nrmse

Absolute values of the root mean square error normalized by the standard deviation (NRMSE).

ks_test

The p-values from a Kolmogorov-Smirnov Test applied on the model residuals to test for normal distribution. P-values > 0.05 indicate normally distributed residuals.

tac

logical; indicates whether temporal autocorrelation (TAC) was detected in the residuals. TRUE if model residuals show TAC. NAs in the time series due to real missing values, test data extraction or exclusion of outliers are explicitly considered. The test is based on the following condition: if any of the acf and pacf values of lag 1 - 5 are greater than 0.4 or lower than -0.4, a TRUE is returned.

pres_outlier

A list-column with all indices of values identified as outliers in each model (i.e.cook`s distance > 1). The indices present the position in the training data, including NAs.

excl_outlier

A list-column listing all outliers per model that have been excluded in the GAM fitting

model

A list-column of IND~press-specific gam objects that contain additionally the logical vector indicating missing values ($train_na).

Details

To evaluate the IND`s sensitivity and robustness time series of the IND are modeled as a smoothing function of one single pressure variable (using a subset of the data as training dataset, e.g. excluding the years of the annual time series). The GAMs are build using the default settings in the gam function and the smooth term function s). However, the user can adjust the distribution and link by modifying the family argument as well as the maximum level of non-linearity by setting the number of knots:

gam(ind ~ s(press, k = k), family = family, data = training_data)

In the presence of significant temporal auto-correlation, GAMs should be extended to Generalized Additive Mixed Models (GAMMs) by including auto-regressive error structures to correct for the auto-correlation (Pinheiro and Bates, 2000). This is implemented in the function model_gamm.

The returned tibble contains various model outputs needed for scoring the sensitivity and robustness subcriteria:

  • p_val to identify whether an IND responds to a specific pressure

  • r_sq for the strength of the IND response

  • edf for the non-linearity of the IND response

  • nrmse for the robustness of the established IND~pressure relationship

The robustness of the modeled pressure relationship based on the training data is evaluated by measuring how well the model prediction matches the test dataset, e.g. the last years. This is quantified by computing the absolute value of the normalized root mean square error (NRMSE) on the test dataset. The normalization to the mean of the observed test data allows for comparisons and a general scoring of the model robustness across INDs with different scales or units.

References

Pinheiro, J.C., Bates, D.M. (2000) Mixed-Effects Models in S and S-Plus. Springer, New York, 548pp.

See also

tibble and the vignette("tibble") for more informations on tibbles, gam for more information on GAMs, and plot_diagnostics for assessing the model diagnostics

Other IND~pressure modeling functions: find_id(), ind_init(), model_gamm(), plot_diagnostics(), plot_model(), scoring(), select_model(), test_interaction()

Examples

# Using the Baltic Sea demo data in this package dat_init <- ind_init( ind_tbl = ind_ex[, c("Sprat", "Cod")], press_tbl = press_ex[, c("Tsum", "Swin", "Fcod", "Fher")], time = ind_ex[ ,1]) gam_tbl <- model_gam(dat_init) # Any outlier? gam_tbl$pres_outlier
#> [[1]] #> NULL #> #> [[2]] #> NULL #> #> [[3]] #> NULL #> #> [[4]] #> NULL #> #> [[5]] #> NULL #> #> [[6]] #> NULL #> #> [[7]] #> 1991 #> 13 #> #> [[8]] #> 2000 #> 22 #>
# Exclude outliers by passing this list as input: gam_tbl_out <- model_gam(dat_init, excl_outlier = gam_tbl$pres_outlier) # \donttest{ # Using another error distribution ind_sub <- round(exp(ind_ex[ ,c(2,8,9)]),0) # to unlog data and convert to integers ind_tbl2 <- ind_init(ind_sub, press_ex, time = ind_ex$Year) model_gam(ind_tbl2, family = poisson(link="log"))
#> # A tibble: 24 × 16 #> id ind press model_type corrstruc aic edf p_val signif_code r_sq #> <int> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> #> 1 1 TZA Year gam none 111636. 4.00 0 *** 0.427 #> 2 2 TZA Tsum gam none 166179. 3.99 0 *** 0.256 #> 3 3 TZA Swin gam none 155118. 4.00 0 *** 0.237 #> 4 4 TZA Pwin gam none 228147. 3.96 0 *** -0.113 #> 5 5 TZA Nwin gam none 189112. 4.00 0 *** 0.0710 #> 6 6 TZA Fsprat gam none 96200. 4.00 0 *** 0.547 #> 7 7 TZA Fher gam none 144579. 3.99 0 *** 0.283 #> 8 8 TZA Fcod gam none 179934. 3.98 0 *** 0.0747 #> 9 9 Sprat Year gam none 88118. 3.99 0 *** 0.516 #> 10 10 Sprat Tsum gam none 174764. 3.99 0 *** -0.0198 #> # … with 14 more rows, and 6 more variables: expl_dev <dbl>, nrmse <dbl>, #> # ks_test <dbl>, tac <lgl>, pres_outlier <list>, model <list>
# }