Abstract

Agglomerative hierarchical clustering plays vital roles in large number of application areas such as medical, bioinformatics, information retrieval etc. It generates clusters by iteratively merging sub clusters and hence merging criterion is very critical for its performance. But most of existing techniques do not consider global features of clusters to decide upon which clusters should be merged and are even unable to undo a wrong merge once it is done. These factors contribute to their poor performance. Also existing techniques face challenges on real world databases because these are proposed either for numeric or categorical data but real world data contains mixed attributes for example medical databases. To address the above mentioned drawbacks, in this paper, we propose a novel agglomerative hierarchical clustering method which avails the spread of data clusters as merging criterion. It is helpful in considering overall distribution of clusters in merging them. Variance and entropy are employed to measure the spread of a cluster in numeric and categorical attributes respectively. To counter the effect of a wrong merge, proposed method allows reallocation of data objects between clusters. We have experimented on real life medical databases and results show the efficacy of proposed approach.KeywordsAccuracy RateData ObjectNumeric AttributeCategorical AttributeHierarchical Agglomerative ClusterThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.