Abstract

AbstractComplex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well‐studied clustering method involving both top‐down and bottom‐up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top‐down and bottom‐up hierarchical clustering. The top‐down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)‐time, n2‐processor Crew Pram algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom‐up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC‐complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUIT VALUE PROBLEM class. This class contains a variety of interesting problems, and now for the first time a problem from data mining as well. By proving that HCP is CC‐complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top‐down sequential approach, and the result surprisingly shows that the parallel complexities of the top‐down and bottom‐up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC‐complete problems. © 2008 Wiley Periodicals, Inc. Complexity, 2008

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.