Abstract

The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of pairs of relatives. Such pedigrees provide the opportunity to investigate the sociological and epidemiological history of human populations in scales much larger than previously possible. Linear mixed models (LMMs) are routinely used to analyze extremely large animal and plant pedigrees for the purposes of selective breeding. However, LMMs have not been previously applied to analyze population-scale human family trees. Here, we present Sparse Cholesky factorIzation LMM (Sci-LMM), a modeling framework for studying population-scale family trees that combines techniques from the animal and plant breeding literature and from human genetics literature. The proposed framework can construct a matrix of relationships between trillions of pairs of individuals and fit the corresponding LMM in several hours. We demonstrate the capabilities of Sci-LMM via simulation studies and by estimating the heritability of longevity and of reproductive fitness (quantified via number of children) in a large pedigree spanning millions of individuals and over five centuries of human history. Sci-LMM provides a unified framework for investigating the epidemiological history of human populations via genealogical records.

Highlights

  • Genealogical records can reflect social and cultural structures, and record the flow of genetic material throughout history

  • Very large pedigree records have come into existence, owing to collaborative digitization of large genealogical records [1,2] and to digitization of large cohorts collected by healthcare providers, spanning up to millions of individuals [3,4,5,6,7]

  • The Sci-Linear mixed models (LMMs) software can currently compute an identity by descent (IBD) matrix, an epistatic covariance matrix and a dominance matrix, as described below

Read more

Summary

Introduction

Genealogical records can reflect social and cultural structures, and record the flow of genetic material throughout history. Very large pedigree records have come into existence, owing to collaborative digitization of large genealogical records [1,2] and to digitization of large cohorts collected by healthcare providers, spanning up to millions of individuals [3,4,5,6,7]. Such population-scale pedigrees allow investigating the sociological and epidemiological history of human populations on a scale that is orders of magnitude larger than existing studies. The analysis of such pedigrees requires modeling complex covariance structures between trillions of pairs of individuals

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call