ind_init
combines the time vector and the indicator (IND) and pressure data into
one tibble with defined training and test observations. All INDs are combined
with all pressures provided as input.
ind_init(ind_tbl, press_tbl, time, train = 0.9, random = FALSE)
ind_tbl | A data frame, matrix or tibble containing only the (numeric) IND variables. Single indicators should be coerced into a data frame to keep the indicator name. If kept as vector, default name will be `ind`. |
---|---|
press_tbl | A data frame, matrix or tibble containing only the (numeric) pressure variables. Single pressures should be coerced into a data frame to keep the pressure name. If kept as vector, default name will be `press`. |
time | A vector containing the actual time steps (e.g. years; should be the same as in the IND and pressure data). |
train | The proportion of observations that should go into the training data on which the GAMs are later fitted. Has to be a numeric value between 0 and 1; the default is 0.9. |
random | logical; should the observations for the training data be randomly chosen? Default is FALSE, so that the last time units (years) are chosen as test data. |
The function returns a tibble
, which is a trimmed down version of
the data.frame(), including the following elements:
id
Numerical IDs for the IND~press combinations.
ind
Indicator names.These might be modified to exclude any character, which is not in the model formula (e.g. hyphens, brackets, etc. are replaced by an underscore, variables starting with a number will get an x before the number.
press
Pressure names.These might be modified to exclude any character, which is not in the model formula (e.g. hyphens, brackets, etc. are replaced by an underscore, variables starting with a number will get an x before the number.
ind_train
A list-column with indicator values of the training data.
press_train
A list-column with pressure values of the training data.
time_train
A list-column with the time steps of the training data.
ind_test
A list-column with indicator values of the test data.
press_test
A list-column with pressure values of the test data.
time_test
A list-column with the time steps of the test data.
train_na
logical; indicates the joint missing values in the training IND and pressure data. That includes the original NAs as well as randomly selected test observations that are within the training period. This vector is needed later for the determination of temporal autocorrelation.
ind_init
will combine every column in ind_tbl with every column in press_tbl
so that each row will represent one IND~press combination. The input data will be
split into a training and a test data set. The returned tibble is the basis for all
IND~pressure modeling functions.
If not all IND~pressure combinations should be modeled,
the respective rows can simply be removed from the output tibble or ind_init
is
applied multiple times on data subsets and their output tibbles merged later using
e.g. bind_rows
.
tibble
and the vignette("tibble")
for more
informations on tibbles
Other IND~pressure modeling functions:
find_id()
,
model_gamm()
,
model_gam()
,
plot_diagnostics()
,
plot_model()
,
scoring()
,
select_model()
,
test_interaction()
# Using the Baltic Sea demo data in this package press_tbl <- press_ex[ ,-1] # excl. Year ind_tbl <- ind_ex[ ,-1] # excl. Year time <- ind_ex[ ,1] # Assign randomly 50% of the observations as training data and # the other 50% as test data ind_init(ind_tbl, press_tbl, time, train = 0.5, random = TRUE)#> # A tibble: 84 × 10 #> id ind press ind_train press_train time_train ind_test press_test #> <int> <chr> <chr> <list> <list> <list> <list> <list> #> 1 1 TZA Tsum <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 2 2 TZA Swin <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 3 3 TZA Pwin <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 4 4 TZA Nwin <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 5 5 TZA Fsprat <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 6 6 TZA Fher <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 7 7 TZA Fcod <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 8 8 MS Tsum <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 9 9 MS Swin <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> 10 10 MS Pwin <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> # … with 74 more rows, and 2 more variables: time_test <list>, train_na <list># To keep the name when testing only one indicator and pressure, coerce both vectors # data frames ind_init(ind_tbl = data.frame(MS = ind_tbl$MS), press_tbl = data.frame(Tsum = press_tbl$Tsum), time, train = .5, random = TRUE)#> # A tibble: 1 × 10 #> id ind press ind_train press_train time_train ind_test press_test #> <int> <chr> <chr> <list> <list> <list> <list> <list> #> 1 1 MS Tsum <dbl [15]> <dbl [15]> <int [15]> <dbl [15]> <dbl [15]> #> # … with 2 more variables: time_test <list>, train_na <list>