Abstract
Patterson's D, also known as the ABBA-BABA statistic, and related statistics such as the f4 -ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across data sets with many populations or species due to computational inefficiencies. Here, we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4 -ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci, and it can also aid in interpretation of a system of f4 -ratio results with the use of the "f-branch" method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, especially across larger genomic data sets.
Highlights
IntroductionAdmixture between populations and hybridization between species are common and a bifurcating tree is often insufficient to capture their evolutionary history (Green et al 2010; Patterson et al 2012; Tung & Barreiro 2017; Kozak et al 2018; Malinsky et al 2018)
Programs for calculating D and the f4-ratio from genomic data include ADMIXTOOLS (Patterson et al 2012), HyDe (Blischak et al 2018), and Comp-D (Mussmann et al 2019). What limits their utility is that none of these programs can handle the variant call format (VCF) (Danecek et al 2011), the standard file format for storing genetic polymorphism data produced by variant callers such as samtools (Li 2011) and GATK (DePristo et al 2011)
The Dsuite software package brings together a number of statistics for learning about admixture history from patterns of allele sharing across populations or closely related species
Summary
Admixture between populations and hybridization between species are common and a bifurcating tree is often insufficient to capture their evolutionary history (Green et al 2010; Patterson et al 2012; Tung & Barreiro 2017; Kozak et al 2018; Malinsky et al 2018). Programs for calculating D and the f4-ratio from genomic data include ADMIXTOOLS (Patterson et al 2012), HyDe (Blischak et al 2018), and Comp-D (Mussmann et al 2019). What limits their utility is that none of these programs can handle the variant call format (VCF) (Danecek et al 2011), the standard file format for storing genetic polymorphism data produced by variant callers such as samtools (Li 2011) and GATK (DePristo et al 2011). Dsuite addresses the above issues in that it calculates D and f4-ratio statistics directly from VCF files, is substantially more efficient than other programs, and provides an implementation of the f-branch statistic (Malinsky et al 2018) to aid interpretation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.