Abstract

Clustering algorithms are commonly used for exploratory data analysis and data mining and used correctly are powerful tools for gaining insights into the underlying structure of data. It is known however that some of these algorithms are dependent upon the parameters with which they start, giving differing results as these vary. Often there is an element of randomness in the initialisation process greatly increasing the difficulty of selecting an appropriately initialised solution. Effective use of these algorithms depends upon the correct choice of appropriate initialisations, however when exploring new data it is often difficult to objectively obtain values appropriate to the problem. The use of initialisation strategies to maximise the performance of the algorithm are therefore important to ensure solutions identified are both consistent with the structure of the data and reproducible. This thesis introduces a coherent strategy for dealing with initialisation in the form of chosen parameter selection and randomness. A Separation Concordance (SeCo) framework is developed which uses a dual measure approach to evaluating the solutions from resampling of starting conditions. This SeCo framework also allows for the inference of an appropriate number of partitions within the data and introduces a SeCo map for visualising the solution space. The performance of these visualisations compared and contrasted with the existing methods in use through an exhaustive series of experiments for both algorithms tested, and is shown to be effectivein the selection of a repeatable solution with high concordance to the underlying structure of the data. These results are benchmarked using a range of synthetic and real world data-sets whose composition ranges from trivial to complex.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.