Abstract

SummaryWe developed PyLAE, a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimating many parameters, it can process thousands of genomes within a day. PyLAE can run on phased or unphased genomic data. We have shown how PyLAE can be applied to the identification of differentially enriched pathways between populations. The local ancestry approach results in higher enrichment scores compared to whole-genome approaches. We benchmarked PyLAE using the 1000 Genomes dataset, comparing the aggregated predictions with the global admixture results and the current gold standard program RFMix. Computational efficiency, minimal requirements for data pre-processing, straightforward presentation of results, and ease of installation make PyLAE a valuable tool to study admixed populations.Availability and implementationThe source code and installation manual are available at https://github.com/smetam/pylae.

Highlights

  • In association studies, researchers combine samples with different phenotypes and compare Single Nucleotide Polymorphisms (SNPs) frequencies in two groups

  • There is a possibility that the association is due to inhomogeneity of the study group in terms of provenance/ origin

  • Instead of solving an “exact admixture” problem, we aim to find the smallest subset of populations whose combined admixture components are close to those of the individuals within a small tolerance margin

Read more

Summary

Introduction

Researchers combine samples with different (usually opposing) phenotypes and compare Single Nucleotide Polymorphisms (SNPs) frequencies in two groups. There is a possibility that the association is due to inhomogeneity of the study group in terms of provenance/ origin (for example, all people with the disease are of French origin, and the healthy cohort is Bulgarian). In this case, two populations may have different frequencies of ancestry informative markers (AIM) that are not causal to the phenotype.

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.