Abstract #201

# 201
Predicting bull fertility using genomic data and biological information.
R. Abdollahi-Arpanahi1,2, G. Morota3, F. Peñagaricano*1, 1University of Florida, Gainesville, FL, 2University of Tehran, Tehran, Iran, 3University of Nebraska-Lincoln, Lincoln, NE.

The use of genomic data has revolutionized the prediction of complex traits in animal breeding in the last decade. Genomic prediction is generally considered as a black box because it ignores any available information about functional features of the genome. However, it is believed that genomic prediction can be more accurate and more persistent by integrating biological information. As such, the main objective of this study was to evaluate alternative models for predicting a complex trait such as bull fertility using both genomic and biological information. Sire conception rate (SCR) was used as a measure of bull fertility. The data set included 8k Holstein bulls with SCR records and 55k single nucleotide polymorphisms (SNPs) spanning the whole genome. Different subsets of SNPs were evaluated, including SNPs within or near genic regions (n = 26k), SNPs linked to genes in the Gene Ontology (GO) term reproduction (n = 0.9k), SNPs linked to genes that belong to Medical Subject Headings (MeSH) terms related to sperm biology (n = 0.3k), and SNPs that were marginally associated with SCR (n = 18k). Both linear and Gaussian kernels were constructed for each set of SNPs and fitted in the models either separately (single kernel) or simultaneously (multi-kernel). Predictive ability was evaluated by mean-squared error (MSE) and predictive correlation (COR) in 5-fold cross-validation. Interestingly, the entire set of SNPs achieved good SCR predictions in the testing set (MSE = 4.13, COR = 0.35). Neither genic regions nor GO or MeSH gene sets achieved predictive abilities higher than their counterparts using random sets of SNPs. Notably, kernel models fitting significant SNPs showed better predictive ability (MSE = 4.04 and COR = 0.36) than the whole-genome approach in both single and multi-kernel analyses. Models fitting Gaussian kernels outperformed their counterparts fitting linear kernels irrespective of the set of SNPs. Overall, our findings suggest that genomic prediction of bull fertility is feasible in dairy cattle. Pre-filtering SNPs based on testing marginal associations seems a promising alternative to avoid fitting the whole set of SNPs. The potential inclusion of gene set results into prediction models deserves further research.

Key Words: prediction of complex traits, gene set, functional genomics