Abstract #M65

Computing pipeline for genomic prediction and estimation using haplotypes and SNP markers.
D. Prakapenka*1, Y. Da1, 1University of Minnesota, Saint Paul, MN.

The haplotype analysis for genomic prediction and estimation requires considerably more data processing and has many more possible configurations of the prediction model than single-SNP analysis. To facilitate haplotype analysis for genomic prediction and estimation, we developed a computing pipeline to implement haplotype analysis. The pipeline includes 3 components, preparation of input data for haplotype analysis, genomic prediction and estimation using GVCHAP, and analysis of GVCHAP results. The input preparation starts with formatting SNP data for 2 imputing programs. A utility program with options to define haplotype blocks by a fixed number of SNPs or a fixed distance in base pairs per block then divides the haplotypes from either imputing program into haplotype blocks where each block is treated as a multi-allelic locus and is formatted as haplotype genotypes where each haplotype genotype contains 2 haplotypes. The haplotype genotypes are used as an input file for running GVCHAP. Another utility program fills in most of the parameter file required by GVCHAP as an input file. The data preparation step also contains utility programs for defining validation samples by random assignment of individuals to each validation sample or by a user provided list of individuals for assigning to validation samples. GVCHAP is the main program for genomic prediction and estimation providing GREML estimates and GBLUP for additive and dominance effects of haplotypes and single SNPs. To reduce the computing time in cross validations due to calculation of genomic relationships, GVCBLUP has a 2-step strategy to save the genomic relationship matrix during the first fold of validation and read in the genomic relationships for the remaining folds of validations. This 2-step strategy is helpful for k-fold validations and for multiple traits. The last component of the computing pipeline calculates observed prediction accuracies and produce input file for graphical analysis of haplotype and SNP heritabilities.

Key Words: genomic selection, haplotype, SNP