Abstract #469

# 469
Bayesian whole-genome prediction and genome-wide association analysis with missing genotypes using variable selection.
C. Chen*1, K. A. Weigel2, E. E. Connor3, D. M. Spurlock4, M. J. VandeHaar1, C. R. Staples5, R. J. Tempelman1, 1Michigan State University, East Lansing, MI, 2University of Wisconsin-Madison, Madison, WI, 3USDA-ARS, Beltsville, MD, 4Iowa State University, Ames, IA, 5University of Florida, Gainesville, FL.

Single-step genomic best linear unbiased Predictor (ssGBLUP) has become increasingly popular for whole-genome prediction (WGP) modeling as it utilizes any available pedigree and phenotypes on both genotyped and non-genotyped individuals. The WGP accuracy of ssGBLUP has been demonstrated to be greater than or equivalent to popular Bayesian regression models. However, these assessments have not typically included phenotypes of non-genotyped individuals in the Bayesian regression analyses, making the interpretation of these comparisons difficult. Increasingly, ssGBLUP has been used for genome-wide association (GWA) studies, although there is no clear guidance on how to determine statistical significance in these analyses. We address this issue and additionally propose a GWA based on a Bayesian single-step stochastic search and variable selection (ssSSVS) model that allows for phenotypes on non-genotyped animals. Our study was based on a dairy consortium data set including 3,186 Holstein cows from 6 US research stations based on the 60671 USDA-ARS bovine SNP panel. In a replicated simulation study using these same genotypes, a different number of causal variants (nc = 30, 300, or 3,000) were randomly assigned to the markers, masking 20% of cows as non-genotyped, for a trait having a heritability of 0.25. We determined that ssSSVS had greater (P < 0.05) WGP accuracy than ssGBLUP with nc = 30 or nc = 300. Moreover, ssSSVS always performed better (P < 0.05) than ssGBLUP for GWA measured as partial area under a receiver-operating characteristic (ROC) curve (pAUC) up to a false positive rate of 5%. In a 10-fold within-station cross-validation study using phenotypes from the dairy consortium, we determined that ssSSVS had greater (P < 0.05) WGP accuracies in milk fat compared with ssGBLUP for genotyped individuals, although no such differences were detected for body weight. No differences between ssSSVS and ssGBLUP for prediction accuracies for non-genotyped individuals were determined for either trait. Overall, ssSSVS is a promising method for both WGP and GWA, particularly for genetic architectures characterized by a few genes with large effects.

Key Words: Bayesian variable selection, genome wide association, whole genome prediction