Abstract
An important strategy to study genome evolution is to investigate the clustering of orthologous genes among multiple genomes, in which the most popular approaches require that the distance between adjacent genes in a cluster be small. We investigate a different formulation based on constraining the overall size of a cluster and develop statistical significance estimates that allow direct comparison of clusters of different sizes. We first consider a restricted version which requires that orthologous genes are strictly ordered within each cluster and show that it can be solved in polynomial time. We then develop practical exact algorithms for the unrestricted problem that allows paralogous genes within a genome and clusters that may not appear in every genome while considering a general model in which a gene is allowed to appear in more than one orthologous group. We show that our algorithm can identify biologically relevant gene clusters on four bacterial genomes Bacillus subtilis, Streptococcus pyogenes, Streptococcus pneumoniae, and Clostridium acetobutylicum. We also show that our algorithm can identify significantly more functionally enriched gene clusters on four yeast genomes Saccharomyces cerevisiae, Saccharomyces paradoxus, Saccharomyces mikatae, and Saccharomyces bayanus than previous algorithms. A software program (GCFinder) and a list of gene clusters found on the bacterial and the yeast genomes are available at http://faculty.cse.tamu.edu/shsze/gcfinder .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.