Abstract #464

# 464
Comparing deep learning methods versus parametric and ensemble methods for the prediction of complex phenotypes.
R. Abdollahi-Arpanahi*1, F. Peñagaricano1, 1University of Florida, Gainesville, FL.

Transforming large amount of genomic data into valuable knowledge for predicting complex traits has been one of the most important challenges for animal and plant breeders. The prediction of complex traits does not escape the current excitement around machine learning, including a renewed interest in “deep learning” techniques such as multilayer perceptrons (MLPs) and convolutional neural networks (CNNs). The main goal of this study was to compare the performance of 2 deep learning methods including MLP and CNN, 2 ensemble learning methods including random forest (RF) and gradient boosting (GB), and 2 parametric methods including genomic best linear unbiased prediction (GBLUP) and Bayes B for predicting a complex phenotype, namely sire conception rate (SCR). A data set consisting of 11,790 Holstein bulls with SCR records and 55k SNP markers was used. Model predictive ability was measured as the Pearson correlation between predicted and observed values and mean squared error of prediction using 5-fold cross-validation. The best predictive correlation was obtained with GB (0.36), followed by BayesB (0.35), GBLUP (0.34), RF (0.29), CNN (0.29) and MLP (0.27). The same trend was observed for mean squared error of prediction. To provide a better evaluation of deep learning methods, different simulation studies were conducted based on the observed genotype data, assuming a heritability of 0.30, and 100 QTNs with either additive or non-additive genetic effects. When the trait architecture was purely additive, BayesB (0.88) and GB (0.79) outperformed other methods. When the genetic architecture of the simulated trait was a combination of additive, dominance and epistasis, the best predictive ability was obtained by GB (0.81), followed by BayesB (0.71), GBLUP (0.67), RF (0.61), MLP (0.60) and CNN (0.59). Overall, GB is a robust method for predicting complex traits. The effective use of deep learning approaches for genomic prediction needs further research.

Key Words: convolutional neural networks, genomic prediction, machine learning