Abstract

Diluvian Clustering is an unsupervised grid-based clustering algorithm well suited to interpreting large sets of noisy compositional data. The algorithm is notable for its ability to identify clusters that are either compact or diffuse and clusters that have either a large number or a small number of members. Diluvian Clustering is fundamentally different from most algorithms previously applied to cluster compositional data in that its implementation does not depend upon a metric. The algorithm reduces in two-dimensions to a case for which there is an intuitive, real-world parallel. Furthermore, the algorithm has few tunable parameters and these parameters have intuitive interpretations. By eliminating the dependence on an explicit metric, it is possible to derive reasonable clusters with disparate variances like those in real-world compositional data sets. The algorithm is computationally efficient. While the worst case scales as O(N²) most cases are closer to O(N) where N is the number of discrete data points. On a mid-range 2014 vintage computer, a typical 20,000 particle, 30 element data set can be clustered in a fraction of a second.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.