Abstract
Current biological evidence suggests a correlation between the function and the position of genes in chromosomes. Examples include operon structure in prokaryotic genomes and similar expression patterns of neighboring genes in some eukaryotic genomes. In this paper, we present a new model and algorithm for identifying conserved clusters from pairwise genome comparison. This generalizes a recent model called gene teams. A team is a set of orthologous genes that appear in two or more species, possibly in a different order yet with the distance of adjacent genes in the team for each chromosome always no more than a certain threshold. We remove the constraint in the original model that each must have a unique copy in the chromosomes, and thus allow the analysis on complex prokaryotic or eukaryotic genomes with extensive paralogs. Our algorithm runs in O(mn) time and uses O(m+n) space, where m and n are the number of common genes in each chromosomes. We used this approach to study two bacterial genomes, E. coli and B. subtilis and successfully identified 85 conserved clusters, including clusters containing uncharacterized genes and a large cluster consisting of 21 ribosomal proteins. Our implementation is publicly available at http://euler.slu.edu/~goldwasser/cogteams/.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.