Abstract

Homologous sequences are widely used to understand the functions of certain genes or proteins. However, there is no consensus to solve the automatic assignment of functions to protein problem and many algorithms have different ways of identifying homologous clusters in a given set of sequences. In this article, we present an algorithm to deal with specific sets, the set of coding sequences obtained from phylogenetically close genomes (of the same species, genus, or family). When modeled as a graph, these sets have their own characteristics: they form more homogeneous and denser clusters. To solve this problem, our algorithm makes use of the clustering coefficient, which maximization can lead to the expected results from the biological point of view. In addition, we also present an algorithm for the identification of sequence domains based on graph topology. We also compared our results with those of the TribeMCL tool, a well-established algorithm of the area.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.