On the maximal cliques in c-max-tolerance graphs and their application in clustering molecular sequences

Katharina A Lehmann,Michael Kaufmann,Kay Nieselt,Stephan Steigele

doi:10.1186/1748-7188-1-9

Katharina A Lehmann, Michael Kaufmann + Show 2 more

Open Access

https://doi.org/10.1186/1748-7188-1-9

Copy DOI

Journal: Algorithms for Molecular Biology	Publication Date: May 31, 2006
Citations: 19	License type: cc-by

Affiliation: University of Tübingen

Abstract

Given a set S of n locally aligned sequences, it is a needed prerequisite to partition it into groups of very similar sequences to facilitate subsequent computations, such as the generation of a phylogenetic tree. This article introduces a new method of clustering which partitions S into subsets such that the overlap of each pair of sequences within a subset is at least a given percentage c of the lengths of the two sequences. We show that this problem can be reduced to finding all maximal cliques in a special kind of max-tolerance graph which we call a c-max-tolerance graph. Previously we have shown that finding all maximal cliques in general max-tolerance graphs can be done efficiently in O(n3 + out). Here, using a new kind of sweep-line algorithm, we show that the restriction to c-max-tolerance graphs yields a better runtime of O(n2 log n + out). Furthermore, we present another algorithm which is much easier to implement, and though theoretically slower than the first one, is still running in polynomial time. We then experimentally analyze the number and structure of all maximal cliques in a c-max-tolerance graph, depending on the chosen c-value. We apply our simple algorithm to artificial and biological data and we show that this implementation is much faster than the well-known application Cliquer. By introducing a new heuristic that uses the set of all maximal cliques to partition S, we finally show that the computed partition gives a reasonable clustering for biological data sets.

Highlights

Viewing the subject sequences aligned to a query sequence that result from a BLAST-based [1] comparison, in many cases one can identify groups of sequences clustering around different subintervals of the query sequence
One goal of this article was to provide a method that allows an automatic clustering of sequences returned from a BLAST run, such that the user can decide whether to maximize the lengths of the common region of the sequences within a cluster or whether to maximize the size of the clusters
In this article we have shown that finding groups from BLAST reports can be reduced to computing maximal cliques in so-called c-max-tolerance graphs

Summary

Introduction

Viewing the subject sequences aligned to a query sequence that result from a BLAST-based [1] comparison, in many cases one can identify groups of sequences clustering around different subintervals of the query sequence. The decision by eye to which cluster a certain sequence belongs, is strongly depending on the order in which the sequences are presented. Fig. 1a) shows a schematic sketch of aligned sequences in random order. The sequences seem to form two, or maybe three groups. The same sequences in Fig. 1b) are ordered according to how many positions they have in common and colors indicate those sequences that share a large part of their sequence. The algorithm finds three different clusters of sequences. As we have argued above, the human (page number not for citation purposes)

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the maximal cliques in c-max-tolerance graphs and their application in clustering molecular sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

Efficient Maximal Spatial Clique Enumeration
Chen Zhang ... Lu Qin
-
Chen Zhang, et. al.Chen Zhang ... Lu Qin
01 Apr 2019
01 Apr 2019

Optimal parallel time bounds for the maximum clique problem on intervals
Lin Chen
Information Processing Letters | VOL. 42
Lin ChenLin Chen
01 Jun 1992
Information Processing Letters | VOL. 42

Finding a Maximum Clique in a Set of Proper Circular Arcs in Time O(n) with Applications
Glenn K Manacher ... Terrance A Mankus
International Journal of Foundations of Computer Science | VOL. 08
Glenn K Manacher, et. al.Glenn K Manacher ... Terrance A Mankus
01 Dec 1997
International Journal of Foundations of Computer Science | VOL. 08

Approximating the maximum clique minor and some subgraph homeomorphism problems
Noga Alon ... Martin Wahlen
Theoretical Computer Science | VOL. 374
Noga Alon, et. al.Noga Alon ... Martin Wahlen
30 Dec 2006
Theoretical Computer Science | VOL. 374

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the maximal cliques in c-max-tolerance graphs and their application in clustering molecular sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology