model_gam
applies Generalized Additive Models (GAMs) to each IND~pressure
combination created in ind_init
and returns a tibble with
IND~pressure-specific GAM outputs.
model_gam(init_tbl, k = 5, family = stats::gaussian(), excl_outlier = NULL)
init_tbl | The output tibble of the |
---|---|
k | Choice of knots (for the smoothing function |
family | A description of the error distribution and link to be used in the GAM.
This needs to be defined as a family function (see also |
excl_outlier | A list of values identified as outliers in specific
IND~pressure GAMs, which should be excluded in this modeling step
(the output tibble of this function includes the variable
`pres_outlier`, which is a column-list containing
all indices of values with cook`s distance > 1 (see below). The function
can be re-run again, then excluding all these outliers provided in
|
The function returns a tibble
, which is a trimmed down version of
the data.frame(), including the following elements:
id
Numerical IDs for the IND~press combinations.
ind
Indicator names.
press
Pressure names.
model_type
Specification of the model type; at this stage containing only "gam" (Generalized Additive Model).
corrstruc
Specification of the correlation structure; at this stage containing only "none".
aic
AIC of the fitted models
edf
Estimated degrees of freedom for the model terms.
p_val
The p values for the smoothing term (the pressure).
signif_code
The significance codes for the p-values.
r_sq
The adjusted r-squared for the models. Defined as the proportion of variance explained, where original variance and residual variance are both estimated using unbiased estimators. This quantity can be negative if your model is worse than a one parameter constant model, and can be higher for the smaller of two nested models.
expl_dev
The proportion of the null deviance explained by the models.
nrmse
Absolute values of the root mean square error normalized by the standard deviation (NRMSE).
ks_test
The p-values from a Kolmogorov-Smirnov Test applied on the model residuals to test for normal distribution. P-values > 0.05 indicate normally distributed residuals.
tac
logical; indicates whether temporal autocorrelation (TAC) was detected in the residuals. TRUE if model residuals show TAC. NAs in the time series due to real missing values, test data extraction or exclusion of outliers are explicitly considered. The test is based on the following condition: if any of the acf and pacf values of lag 1 - 5 are greater than 0.4 or lower than -0.4, a TRUE is returned.
pres_outlier
A list-column with all indices of values identified as outliers in each model (i.e.cook`s distance > 1). The indices present the position in the training data, including NAs.
excl_outlier
A list-column listing all outliers per model that have been excluded in the GAM fitting
model
A list-column of IND~press-specific gam objects that contain additionally
the logical vector indicating missing values ($train_na
).
To evaluate the IND`s sensitivity and robustness time series of the IND are
modeled as a smoothing function of one single pressure variable (using a subset
of the data as training dataset, e.g. excluding the years of the annual time series).
The GAMs are build using the default settings in the gam
function and
the smooth term function s
). However, the user can adjust
the distribution and link by modifying the family argument as well as the
maximum level of non-linearity by setting the number of knots:
gam(ind ~ s(press, k = k), family = family, data = training_data)
In the presence of significant temporal auto-correlation, GAMs should be extended to
Generalized Additive Mixed Models (GAMMs) by including auto-regressive error structures
to correct for the auto-correlation (Pinheiro and Bates, 2000). This is implemented in
the function model_gamm
.
The returned tibble contains various model outputs needed for scoring the sensitivity and robustness subcriteria:
p_val
to identify whether an IND responds to a specific pressure
r_sq
for the strength of the IND response
edf
for the non-linearity of the IND response
nrmse
for the robustness of the established IND~pressure relationship
The robustness of the modeled pressure relationship based on the training data is evaluated by measuring how well the model prediction matches the test dataset, e.g. the last years. This is quantified by computing the absolute value of the normalized root mean square error (NRMSE) on the test dataset. The normalization to the mean of the observed test data allows for comparisons and a general scoring of the model robustness across INDs with different scales or units.
Pinheiro, J.C., Bates, D.M. (2000) Mixed-Effects Models in S and S-Plus. Springer, New York, 548pp.
tibble
and the vignette("tibble")
for more
informations on tibbles,
gam
for more information on GAMs, and
plot_diagnostics
for assessing the model diagnostics
Other IND~pressure modeling functions:
find_id()
,
ind_init()
,
model_gamm()
,
plot_diagnostics()
,
plot_model()
,
scoring()
,
select_model()
,
test_interaction()
# Using the Baltic Sea demo data in this package dat_init <- ind_init( ind_tbl = ind_ex[, c("Sprat", "Cod")], press_tbl = press_ex[, c("Tsum", "Swin", "Fcod", "Fher")], time = ind_ex[ ,1]) gam_tbl <- model_gam(dat_init) # Any outlier? gam_tbl$pres_outlier#> [[1]] #> NULL #> #> [[2]] #> NULL #> #> [[3]] #> NULL #> #> [[4]] #> NULL #> #> [[5]] #> NULL #> #> [[6]] #> NULL #> #> [[7]] #> 1991 #> 13 #> #> [[8]] #> 2000 #> 22 #># Exclude outliers by passing this list as input: gam_tbl_out <- model_gam(dat_init, excl_outlier = gam_tbl$pres_outlier) # \donttest{ # Using another error distribution ind_sub <- round(exp(ind_ex[ ,c(2,8,9)]),0) # to unlog data and convert to integers ind_tbl2 <- ind_init(ind_sub, press_ex, time = ind_ex$Year) model_gam(ind_tbl2, family = poisson(link="log"))#> # A tibble: 24 × 16 #> id ind press model_type corrstruc aic edf p_val signif_code r_sq #> <int> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> #> 1 1 TZA Year gam none 111636. 4.00 0 *** 0.427 #> 2 2 TZA Tsum gam none 166179. 3.99 0 *** 0.256 #> 3 3 TZA Swin gam none 155118. 4.00 0 *** 0.237 #> 4 4 TZA Pwin gam none 228147. 3.96 0 *** -0.113 #> 5 5 TZA Nwin gam none 189112. 4.00 0 *** 0.0710 #> 6 6 TZA Fsprat gam none 96200. 4.00 0 *** 0.547 #> 7 7 TZA Fher gam none 144579. 3.99 0 *** 0.283 #> 8 8 TZA Fcod gam none 179934. 3.98 0 *** 0.0747 #> 9 9 Sprat Year gam none 88118. 3.99 0 *** 0.516 #> 10 10 Sprat Tsum gam none 174764. 3.99 0 *** -0.0198 #> # … with 14 more rows, and 6 more variables: expl_dev <dbl>, nrmse <dbl>, #> # ks_test <dbl>, tac <lgl>, pres_outlier <list>, model <list># }