Abstract
BackgroundThe gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations.ResultsWe proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the sensitivity and specificity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and is able to recover 80% of a catalog of known sweep regions.ConclusionSweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.
Highlights
The gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population
We propose a new gene-centric approach for identifying the gene-specific sweeps in prokaryotes, which search for regions with a higher level of spatial clustering of single nucleotide polymorphisms (SNPs) assuming a null distribution model of SNPs under neutral selection
It is in compliance with the expected time complexity O(n + nb · k), whereby the CPU time is governed by optimizing the boundary SNPs when n is small, but governed by clustering the inner SNPs when n is large with the ratio of boundary SNPs rapidly declining
Summary
The gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. The process will imprint genetic signatures in the population genomes, leading to lowered within-population genetic diversity, increased between-population differentiation, and/or high linkage disequilibrium [3,4,5] When such selective sweeps only occur at specific gene regions under selection without affecting the genome-wide diversity, they are described as gene-specific sweep [6]. The gene-specific sweep has been demonstrated to play important roles in adaptive evolution in microbial populations, such as ecological differentiation in Prochlorococcus [7] and Synechococcus [8], speciation in marine bacterium Vibrio cyclitrophicus (V. cyclitrophicus) [3, 9], and phenotypic divergence in human adapted pathogen Streptococcus pyogenes (S. pyogenes) [10]. Identifying the gene-specific sweep on the genome-wide scale will provide insights into the evolutionary mechanisms shaping the genetic diversity, and help to unravel potential genetic markers associated with ecological adaptation or phenotypic differentiation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.