Abstract #169

# 169
Implications of limited dimensionality of genomic information on persistency of genomic predictions and GWAS.
Ignacy Misztal*1, Ivan Pocrnic1, Daniela Lourenco1, 1University of Georgia, Athens GA.

The purpose of this study was finding possible explanation on peculiarities of dimensionality (M) of genomic information. The gene content matrix derived from 35 to 60k SNP chips has a limited M as determined by singular value decomposition; identical results are obtained with eigenvalues of genomic relationship matrix. Even with a very large number of animals, M ranges from about 4,000 for commercial pigs and broiler chicken, to about 15,000 in Holsteins. This number is normally attributed to the expected number of chromosome junctions as derived by Stam: M = 4NeL, where Ne is effective population size and L is genome size. However, approximation of realized accuracies assuming M for animals with same information is not accurate. Accuracies of genomic prediction assuming M/4 animals in genomic recursions and the APY algorithm are >90% of those assuming full dimensionality. These recursions also suggest that predictions based on M animals with very high reliability should be both very accurate and persistent, and predictions from large national evaluations in Holsteins could converge. However, the real accuracies seem lower than expected. The genome in a population can be visualized in 2 ways. First, as Ne haplotypes within each 1/4 Morgan segment. Second, as 4NeL sequential segments. Eigenvalues analyses of the genomic information shows that popular segments cluster along the genome. Subsequently the number of segments can be higher than determined by singular values. In particular, M/4 clusters could account for 90% of segments. SNP selection decreases the dimensionality; the minimum is the number of causative SNP. SNP selection can eliminate clusters without substantial variation but point to clusters with high variation, potentially creating high GWAS signals not related to QTL. Some ideas in this study were derived from simulated populations assuming complete genome coverage and an additive model. It remains to be seen whether accuracy predictions in real populations are affected by additional factors such as incomplete genome coverage and non-additive effects. Singular value analysis of gene content (or eigenvalue analysis of genomic relationship matrix) helps understand the complexity of genomic selection.

Key Words: genomic selection, APY algorithm, dimensionality