`R/plot_diagnostics.R`

`plot_diagnostics.Rd`

`plot_diagnostics`

takes a list of models of class `gam`, `gamm`
or `thresh_gam` or a mix of those and produces some diagnostic
information of the fitting procedure and results. The function returns a
tibble with 6 list-columns containing individual plots (ggplot2 objects)
and one list-column containing a plot that shows all diagnostic plots
together.

plot_diagnostics(model_list)

model_list | A list with models of class gam(m) and/or thresh_gam,
e.g. the list-column |
---|

The function returns a `tibble`

, which is a trimmed down version
of the data.frame(), including the following elements:

`ind`

Indicator names.

`press`

Pressure names.

`cooks_dist`

A list-column of ggplot2 objects that show the Cook`s distance of all observations, which is a leave-one-out deletion diagnostics to measure the influence of each observation. Data points with a large Cook`s distance (> 1) are considered to merit closer examination in the analysis.

`acf_plot`

A list-column of ggplot2 objects that show the autocorrelation function for the residuals. NAs in the time series due to real missing values, test data extraction or exclusion of outliers are explicitly considered.

`pacf_plot`

A list-column of ggplot2 objects that show the partial autocorrelation function for the residuals. NAs are explicitly considered.

`resid_plot`

A list-column of ggplot2 objects that show residuals vs. fitted values.

`qq_plot`

A list-column of ggplot2 objects that show the quantile-quantile plot for normality.

`gcvv_plot`

A list-column of ggplot2 objects that show for a threshold-GAM the development of the generalized cross-validation value at different thresholds level of the modifying pressure variable. The GCV value of the final chosen threshold should be distinctly lower than for all other potential thresholds, i.e., the line should show a pointy negative peak at this threshold. If this is not the case, e.g. the trough is very wide with similar GCV values for nearby thresholds, the threshold-GAM is not optimal and should not be favored over a GAM despite the better LOOCV (leave-one-out cross-validation value).

`all_plots`

A list-column of ggplot2 objects that show all five (six if threshold-GAM) plots together. For this plot, drawing canvas from the

`cowplot`

package were added on top of ggplot2.

The function can deal with any model of the classes `gam`, `gamm` or `thresh_gam` as long as the input is a flat list. That means:

If only one model is provided as input coerce the model explicitly to class `list`. An input such as model_gam_ex[1, "model"] will not work as the class is a tibble. Use instead model_gam_ex$model[1].

If the input are one or more threshold-GAMs selected from the

`test_interaction`

output (variable`thresh_models`

the model list features a nested structure: each IND~pressure pair (row) might have more than one threshold-GAM. To remove the nested structure use e.g. the`flatten`

function (see examples).

`cooks.distance`

, `acf`

,
`pacf`

, `qqnorm`

, and
`flatten`

for removing a level hierarchy from a list

Other IND~pressure modeling functions:
`find_id()`

,
`ind_init()`

,
`model_gamm()`

,
`model_gam()`

,
`plot_model()`

,
`scoring()`

,
`select_model()`

,
`test_interaction()`

if (FALSE) { # Using some models of the Baltic Sea demo data: # Apply function to a list of various model types model_list <- c(all_results_ex$thresh_models[[5]], model_gam_ex$model[39], all_results_ex$model[76]) plots <- plot_diagnostics(model_list) plots$cooks_dist[[1]] plots$acf_plot[[2]] plots$pacf_plot[[3]] plots$resid_plot[[1]] plots$qq_plot[[1]] plots$gcvv_plot[[1]] # for threshold models plots$all_plots[[1]] # shows all 5-6 plots # Make sure that thresh_models have not a nested list structure: model_list <- all_results_ex$thresh_models[5:6] %>% purrr::flatten(.) plots <- plot_diagnostics(model_list) }