Medical data processing is one of the priority machine learning areas. Usually, data obtained in the process of medical patient monitoring are complex and have a different nature. Solving the problem of clustering, classification, or forecasting problem these data requires the creation of new methods or improvement of existing methods to improve the decision accuracy and effectiveness. The classical clustering approaches and the c-means fuzzy clustering method were analyzed. Based on the multiagent systems theory, it is proposed to use in the c-means method the separate rules for selecting elites when forming clusters and selecting the best of them in accordance with the chosen intra-cluster distance measures. The result of solving such a problem is the number of clusters, as well as the number of elements in them. The method quality was tested on Fisher iris data set using three measures of intra-cluster distance: Mahalanobis distance, Mahalanobis distance considering the membership function, and Kullbak-Leibler entropy. The highest accuracy of 98% was obtained for the distance measured by the Kullbak-Leibler entropy. Therefore, this measure was chosen to solve the clustering problem of medical monitoring data for prostate disease. Medical monitoring data were divided into four classes of patient states: “healthy persons”, “non-metastatic patients”, “metastatic patients” and “hormone-resistant patients”. The accuracy of clustering according to medical data was 95,6%. In addition to accuracy, the confusion matrix, ROC- and LF-curves were used to assess the method quality. The minimum value of the ROC-curve was 0.96 for Fisher's irises and 0.95 for medical monitoring data, which characterizes the high quality of the proposed clustering method. The loss function value is also quite small (-0.056 and -0.0176 for each considered data set), which means that the optimal cluster number and the distribution of data over them are obtained. Based on the obtained results analysis, the proposed method can be recommended for use in medical information and diagnostic decision support systems for clustering monitoring data.
Read full abstract