Abstract

There have been more than 50 type clustering algorithms developed for getting meaningful information from big datasets and grouping individuals according to their characteristics. In actual researches, it is often seen that data involves all types of variables. In this case, it is very important to select appropriate clustering algorithm according to different data types. In this study, we will provide information about EM(Expectation Maximization),Two–Step Clustering methods which are developed in recent years and one of the best methods for data sets containing mixed types of variables. And the second aim is to compare the methods by producing a data set from health field information.These algorithms are generally recommended for large data sets but there are also used n medium-sized data sets. Medium- sized data sets are more often in actual researches.Therefore, fifty people for control group and fifty people for patients that have polycystic over syndrome were taken to the study. Totally nineteen variables were measured from these subjects and thirteen of them were quantitative, six of them were qualitative.Clusters were obtained by EM and Two-Step cluster methods.To evaluate the relationships between the clusters obtained from algorithms and actually known patient, control groups were analyzed by Kappa coefficient. It was found that EM clustering algorithm has highest compliance coefficient comparing with Two-Step cluster(Kappa=0,740;p

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.