Abstract #463

# 463
A Genetic Diversity Index method to improve imputation accuracies of rare variants.
A. M. Butty*1, F. Miglior1,2, P. Stothard3, F. S. Schenkel1, B. Gredler4, M. Sargolzaei1,5, C. F. Baes1, 1Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada, 2Canadian Dairy Network, Guelph, ON, Canada, 3Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada, 4Qualitas AG, Zug, ZG, Switzerland, 5Semex Alliance, Guelph, ON, Canada.

Different methods to select animals for sequencing have been developed, which rely on pedigree-based relationship matrices, genomic relationships matrices, or on haplotype frequencies. Relationship-based methods select representative key animals of a population whereas haplotype frequency methods aim for better coverage of rare variants. Good average accuracies of imputation from SNP chip to whole-genome sequence (WGS) for common haplotypes were reached with the relationship-based methods. Imputation of rare variants, however, still needs to be improved, which can possibly be accomplished with a newly developed Genetic Diversity Index (GDI). This algorithm optimizes the count of unique haplotypes present in a group of animals composed of already sequenced individuals and a fixed number of sequencing candidates. Optimization is run iteratively, exchanging one candidate at a time and computing the GDI of the new group. Use of the simulated annealing algorithm defines whether the last individual added to the group should be kept. Simulated annealing has the advantage of searching for a global optimum in a situation where multiple local optima are present. The previously mentioned key ancestor and haplotype-based methods for selecting sequencing candidate were assessed and compared with the GDI algorithm using simulated cattle WGS data. Average squared correlation coefficients were used to assess imputation accuracy. A preliminary study showed that the accuracy was 1.5% higher when using GDI to enlarge the reference population than the second-best method. Application of the different methods of selection in North American Holstein data showed that the GDI algorithm selected animals carrying a higher percentage of rare haplotypes than other methods examined. Principal component analysis of the population showed that the animals selected with all tested methods were similarly distributed over the pool of candidates. When representative animals of a population are already sequenced and good overall imputation accuracies are reached, sequencing of genetically diverse animals improved the accuracy of the imputation of rare variants to the WGS density level.

Key Words: sequencing, simulation, imputation