Abstract #M69

# M69
Haplotype-based methods to select animals to sequence for later accurate imputation.
A. M. Butty1, M. Sargolzaei1,2, F. Miglior1, P. Stothard3, F. S. Schenkel1, B. Gredler-Grandl4,5, C. F. Baes*1,6, 1University of Guelph, Guelph, ON, Canada, 2Select Sires Inc, Plain City, OH, 3University of Alberta, Edmonton, AB, Canada, 4Qualitas AG, Zug, Switzerland, 5Wageningen University, Wageningen, the Netherlands, 6University of Bern, Bern, Switzerland.

The availability of array genotypes in dairy cattle has increased steadily in the last decade, and imputation to whole-genome sequence (WGS) has been widely studied. Although variants with minor allele frequency (MAF) below 0.05 usually represent more than half of all WGS variants identified, they are commonly excluded in most studies. Imputation of such variants is often inaccurate, impeding the use of such rare variants in further analyses. Furthermore, reference population selection also has a large impact on the accuracy of imputation. In this study, we present 2 novel methods of selection that rely on haplotype information and evaluate them in comparison with 2 previously described methods. The Genetic Diversity Index method optimizes the number of unique haplotype alleles present in the future selected group of animals, whereas the Highly Segregating Haplotype method aims to capture the most haplotype alleles possible, starting with alleles of high frequency in the population. We first simulated whole- genome sequence data of a dairy cattle population, mimicking the MAF distribution and the linkage disequilibrium pattern found in the North-American Holstein population. Reference populations of 50 to 1,200 animals were created using the 4 different selection methods. Finally, a group of target animals with simulated high-density genotypes was imputed. Accuracy of imputation was measured and compared for allelic r2 between true and imputed genotypes for variants of different MAF. Imputation accuracy for common variants was between 0.85 and 0.99, whereas imputation accuracy of rare variants varied between 0.40 and 0.91. In general, methods based on selecting animals for their genetic diversity led to better imputation accuracy of variants with a MAF below 0.05. Methods targeting animals carrying common haplotype alleles led to higher imputation accuracies of variants with higher MAF. Therefore, the intended use of the imputed WGS must be accounted for at the time of selecting the animals comprising the future reference population.

Key Words: sequencing, imputation, haplotype