Abstract

Pedigrees contain information about the genealogical relationships among individuals and are of fundamental importance in many areas of genetic studies. However, pedigrees are often unknown and must be inferred from genetic data. Despite the importance of pedigree inference, existing methods are limited to inferring only close relationships or analyzing a small number of individuals or loci. We present a simulated annealing method for estimating pedigrees in large samples of otherwise seemingly unrelated individuals using genome-wide SNP data. The method supports complex pedigree structures such as polygamous families, multi-generational families, and pedigrees in which many of the member individuals are missing. Computational speed is greatly enhanced by the use of a composite likelihood function which approximates the full likelihood. We validate our method on simulated data and show that it can infer distant relatives more accurately than existing methods. Furthermore, we illustrate the utility of the method on a sample of Greenlandic Inuit.

Highlights

  • Pedigree information is used in many areas of genetic analysis, including discovery of diseaserelated markers in co-segregation analysis and family-based association studies [1], pedigreeinformed haplotype and genotype imputation [2], and in estimating variance components for quantitative traits [3]

  • Most population genetic inference methods are based on coalescence theory, which models the genealogical relationships among samples of genetic data at a time scale of N generations, where N is the effective population size

  • Simulation studies have shown that the coalescent is a poor approximation of the genealogical process over short time frames (< log2N generations, where N is the population size), potentially leading to inaccurate inferences at these time scales [8, 9]

Read more

Summary

Introduction

Pedigree information is used in many areas of genetic analysis, including discovery of diseaserelated markers in co-segregation analysis and family-based association studies [1], pedigreeinformed haplotype and genotype imputation [2], and in estimating variance components for quantitative traits (e.g. heritability) [3]. The regularizer was designed to weight against individuals from forming family clusters, motivated by the fact that in large data sets there are so many potential pedigree relationships for each individual, that most individuals will be inferred to have some pedigree relationship to at least one individual in the sample, even when they are unrelated. This is essentially a multiple testing problem in which an increasing number of individuals in the sample implies a reduced probability of inferring an individual to be unrelated to all individuals in the sample.

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.