Fast Optimal Circular Clustering and Applications on Round Genomes.

Tathagata Debnath,Mingzhou Song

doi:10.1109/tcbb.2021.3077573

Tathagata Debnath, Mingzhou Song

Open Access

https://doi.org/10.1109/tcbb.2021.3077573

Copy DOI

Abstract

Round genomes are found in bacteria, plant chloroplasts, and mitochondria. Genetic or epigenetic marks can present biologically interesting clusters along a circular genome. The circular data clustering problem groups N points on a circle into K clusters to minimize the within-cluster sum of squared distances. Repeatedly applying the K-means algorithm takes quadratic time, impractical for large circular datasets. To overcome this issue, we developed a reproducible fast optimal circular clustering (FOCC) algorithm of worst-case O(KN log2 N) time. The core is a fast optimal framed clustering algorithm, which we designed by integrating two divide-and-conquer and one bracket dynamic programming strategies. The algorithm is optimal based on a property of monotonic increasing cluster borders over frames on linearized data. On clustering 50,000 circular data points, FOCC outruns brute-force or heuristic circular clustering by three orders of magnitude in time. We produced clusters of CpG sites and genes along three round genomes, exhibiting higher quality than heuristic clustering. More broadly, the presented subquadratic-time algorithms offer the fastest known solution to not only framed and circular clustering, but also angular, periodical, and looped clustering. We implemented these algorithms in the R package 'OptCirClust' (https://CRAN.R-project.org/package=OptCirClust).

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM transactions on computational biology and bioinformatics	Publication Date: May 4, 2021
Citations: 7	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Fast Optimal Circular Clustering and Applications on Round Genomes.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on computational biology and bioinformatics

Lead the way for us

Similar Papers

Fast Dimension-based Partitioning and Merging clustering algorithm
Tamer F Ghanem ... Mohiy M Hadhoud
Applied Soft Computing Journal | VOL. 36
Tamer F Ghanem, et. al.Tamer F Ghanem ... Mohiy M Hadhoud
22 Jul 2015
Applied Soft Computing Journal | VOL. 36

Fast synchronization clustering algorithms based on spatial index structures
Xinquan Chen
Expert systems with applications | VOL. 94
Xinquan ChenXinquan Chen
23 Oct 2017
Expert systems with applications | VOL. 94

SPICi: a fast clustering algorithm for large biological networks
Peng Jiang ... Mona Singh
Computer applications in the biosciences : CABIOS | VOL. 26
Peng Jiang, et. al.Peng Jiang ... Mona Singh
24 Feb 2010
Computer applications in the biosciences : CABIOS | VOL. 26

To group at the base of users' usage preference of network services based on fast hierarchical clustering algorithm
Minjie Guo ... Wenli Zhou
-
Minjie Guo, et. al. Minjie Guo ... Wenli Zhou
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Optimal Circular Clustering and Applications on Round Genomes.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on computational biology and bioinformatics