P3-13: Understanding the Effective Number of Independent Chromosome Segments as a Tuning Parameter for Genomic Prediction
PhD Student: Nicolas Frioni
Supervisor: Dr. Malena Erbe
Group: Animal Breeding and Genetics
Project Description:
Genomic prediction has become a widely used tool in livestock for selecting young individuals that do not have performance records or progeny information of their own. Many factors influence the accuracy of genomic prediction, amongst others the underlying genomic structure in a population, i.e. the number of chromosome segments that segregate independently is a parameter, which is assumed to be population-specific (e.g. [1]). Different authors ([2];[3]) have tried to determine the number of independently segregating chromosome segments (Me) as a function of the effective population size (Ne) and the length of the genome. Theoretical derivations of these functions (e.g. [2]) are based on the expected linkage disequilibrium (LD) structure in form of the squared correlation between haplotypes. Thus, populations with similar history and similar linkage disequilibrium patterns are expected to have similar values of Me.
In another study ([4]), it was demonstrated that Me, empirically determined with a maximum likelihood approach and genomic prediction results from mixed models including the genomic relationship matrix, varied substantially between two dairy cattle breeds, while estimates of Ne based on pedigree and on linkage disequilibrium structure were very similar. Empirically determined values of Me also differed between traits, which is unexpected for a population parameter. Another study ([5]) revealed that the amount of LD in cattle is massively biased when estimated from single nucleotide polymorphisms (SNPs) compared to what is found with genome-wide sequence data. It thus appears that Me is influenced by the scale of the used genomic data. As Me in turn is a major determinant of the accuracy of genomic prediction, it is important to find a scale-independent approach to estimate Me.
For these reasons, we need a deeper insight in the concept of Me, its derivation, and possible applications. Simulated data with predefined genomic structure will be the starting point to study the impact of different scales (effective population size, marker density, genomic architecture etc.) on Me and thus on the accuracy of genomic prediction. Empirical datasets of different species (chicken, pig, cattle) with different marker densities (tens of thousands of markers from genotyping platforms up to millions from sequence data) are available to this project to test newly developed approaches and study different population structures.
This project will first focus on finding a meaningful quantitative-genetic definition of the parameter Me. Methodological findings from other projects of the Research Training Group regarding the modeling of non-independent variables and linkage disequilibrium structure will be helpful for this part. Further aims of this project comprise the development of methods to determine Me empirically based on different genomic scales, finding an appropriate deterministic formula to describe Me, as well as figuring out the magnitude and structure of its impact on the accuracy of genomic prediction. Furthermore, theoretical ideas to describe the expected maximum accuracy with a given Me will be developed. For this, the close collaboration of the Research Training Group with the Mercator Fellow Gustavo de los Campos, who has already worked on determining theoretical limits of accuracy of prediction, will be valuable.
It is planned that the successful PhD candidate conducts some part of the research at the Bavarian State Research Centre for Agriculture in Grub. This PhD project further includes an extended research stay in the group of one international collaborator in Australia.
References:
[1] Daetwyler, H.D., Calus, M.P.L., Pong-Wong, R., de los Campos, G. and Hickey, J.M. (2013): Genomic prediction in animals and plants: Simulation of data, validation, reporting, and benchmarking. Genetics 193:347-365.
[2] Goddard, M.E., Hayes, B.J. and Meuwissen, T.H.E. (2011). Using the genomic relationship matrix to predict the accuracy of genomic selection. J. Anim. Breed. Genet.128:409-421.
[3] Hayes, B.J., Visscher, P.M. and Goddard, M.E. (2009). Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res. 91:47-60.
[4] Erbe, M., Gredler, B., Seefried, F.R., Bapst, B. and Simianer, H. (2013). A function accounting for training set size and marker density to model the average accuracy of genomic prediction. PLoS ONE 8:e81046.
[5] Qanbari, S., Pausch, H., Jansen, S., Somel, M., Strom, T.M., Fries, R., Nielsen, R. and Simianer, H. (2013). Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 10:e1004148.