Abstract
Clustering has been widely adopted in numerous applications, including pattern recognition, data analysis, image processing, and market research. When performing data mining, traditional clustering algorithms which use distance-based measurements to calculate the difference between data are unsuitable for non-numeric attributes such as nominal, Boolean, and categorical data. Applying an unsuitable similarity measurement in clustering may cause some valuable information embedded in the data attributes to be lost, and hence low quality clusters will be created. This paper proposes a novel hierarchical clustering algorithm, referred to as MPM, for the clustering of non-numeric data. The goals of MPM are to retain the data features of interest while effectively grouping data objects into clusters with high intra-similarity and low inter-similarity. MPM achieves these goals through two principal methods: (1) the adoption of a novel similarity measurement which has the ability to capture the "characterized properties" of information, and (2) the application of matrix permutation and matrix participation partitioning to the results of the similarity measurement (constructed in the form of a similarity matrix) in order to assign data to appropriate clusters. This study also proposes a heuristic-based algorithm, the Heuristic_MPM, to reduce the processing times required for matrix permutation and matrix partitioning, which together constitute the bulk of the total MPM execution time.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.