Inferring whole-genome histories in large population datasets

Jerome Kelleher,Anthony W Wohns,Yan Wong,Patrick K Albers,Chaimaa Fadil,Gil Mcvean

doi:10.1038/s41588-019-0483-y

Jerome Kelleher, Anthony W Wohns + Show 4 more

Open Access

https://doi.org/10.1038/s41588-019-0483-y

Copy DOI

Journal: Nature genetics	Publication Date: Sep 1, 2019
Citations: 204	License type: unspecified-oa

Affiliation: University of Oxford

Abstract

Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology as it encodes information about the events and forces that have influenced a species. However, current methods are limited, with the most accurate able to process no more than a hundred samples. With data sets consisting of millions of genomes being collected, there is a need for scalable and efficient inference methods to fully utilise these resources. We introduce an algorithm to infer whole-genome histories with comparable accuracy to the state-of-the-art but able to process four orders of magnitude more sequences. The approach also provides an “evolutionary encoding” of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.

Full Text