Abstract

One of the key concepts in data mining is to give a suitable partition of datasets in an automatic way. On one hand, classification method is to find the partitions given by combinations of attribute-value pairs which are best fit to the partition given by target concepts. On the other hand, clustering method is to find the partitions which best characterize given datasets by using a similarity measure. Therefore, the choice of distance or similarity measures are one of the most important research topics in data mining. However, such empirical comparisons have never been studied in the literature. In this paper, several types of similarity measures were compared in the following three clinical contexts: the first one is for datasets composed of only categorical attributes. The second one is for those of mixture of categorical and numerical attributes. The final one is for those of only numerical attributes. Experimental results show that simple similarity measures perform as well as new proposed measures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.