Prospects of whole-genome sequence data in animal and plant breeding

Rianne Van Binsbergen

doi:10.18174/413524

Abstract

The rapid decrease in costs of DNA sequencing implies that whole-genome sequence data will be widely available in the coming few years. Whole-genome sequence data includes all base-pairs on the genome that show variation in the sequenced population. Consequently, it is assumed that the causal mutations (e.g. quantitative trait loci; QTL) are included, which allows testing a given trait directly for association with a QTL, and might lead to discovery of new QTL or higher accuracies in genomic predictions compared to currently available marker panels. The main aim of this thesis was to investigate the benefits of using whole-genome sequence data in breeding of animals and plants compared to currently available marker panels. First the potential and benefits of using whole-genome sequence data were studied in (dairy) cattle. Accuracy of genotype imputation to whole-genome sequence data was generally high, depending on the used marker panel. In contrast to the expectations, genomic prediction showed no advantage of using whole-genome sequence data compared to a high density marker panel. Thereafter, the use of whole-genome sequence data for QTL detection in tomato (S. Lycopersicum) was studied. In a recombinant inbred line (RIL) population, more QTL were found when using sequence data compared to a marker panel, while increasing marker density was not expected to provide additional power to detect QTL. Next to the RIL population, also in an association panel it was shown that, even with limited imputation accuracy, the power of a genome-wide association study can be improved by using whole-genome sequence data. For successful application of whole-genome sequence data in animals or plants, genotype imputation will remain important to obtain accurate sequence data for all individuals in a cost effective way. Sequence data will increase the power of QTL detection in RIL populations, association panels or outbred populations. Added value of whole-genome sequence data in genomic prediction will be limited, unless more information is known about the biological background of traits and functional annotations of DNA. Also statistical models that incorporate this information and that can efficiently handle large datasets have to be developed.

Full Text