Abstract #4

# 4
Cross validation and bootstrapping: Part I (lecture).
J. A. D. R. N. Appuhamy*1, L. E. Moraes2, 1Department of Animal Science, Iowa State University, Ames, IA, 2Department of Animal Science, The Ohio State University, Columbus, OH.

Cross-validation (CV) and bootstrapping are resampling methods that refit a model to samples drawn from the data. CV helps in choosing a “best” model associated with the lowest prediction error rate, whereas bootstrapping allows determining the uncertainty of parameter estimates. One may be tempted to use whole data set to develop and evaluate a model simultaneously. This approach however has issues such as overfitting and thus selects models that would potentially fail on an independent data set. These limitations can be overcome successfully with CV. Traditionally, CV is applied by splitting the data into 2 sets training, test that are used for model development and evaluation, respectively. This method called Hold-out is not recommended particularly for small data sets as the error rate would depend exclusively on the split and be misleading for a different split. Data splitting methods such as K-fold and Leave-one-out are recommended to overcome those limitations. K-fold CV involves dividing the data into K number of samples and holding out one as the test set to determine the error rate. In leave-one-out CV, only one observation is held out at a time as the test set. In both cases, the true error rate for models with continuous responses is generally estimated as the average of the separate error estimates. Bootstrapping is a powerful statistical tool involving resampling with replacement and commonly used to quantify standard error or the confidence interval of statistical estimates. Consequently, bootstrapping allows for determining bias, standard errors, and confidence intervals of statistical estimates. Traditionally, the uncertainty of model parameters are estimated by deriving the sampling distribution based on assumptions about distribution of the population. In contrast, bootstrapping allow estimating the uncertainty without explicitly deriving the sampling distribution that way although it is important to keep in mind that the bootstrap depends on the bootstrap principle “Sampling with replacement behaves on the original sample the way the original sample behaves on a population.” This lesson will cover the principles and implementations of CV and bootstrapping for models frequently used in animal nutrition.

Key Words: model evaluation, prediction error, resampling