Abstract
In this paper, we propose a diabetes data anomaly detection approach based on hierarchical clustering and support vector machine (SVM), named hierarchical support vector machine (HCSVM). In the HCSVM approach, the diabetes data sets with the same data characteristics are classified by clustering algorithm, and the data are divided into significant abnormal parts and potential abnormal parts. Additionally, the convolutional neural network (CNN) is utilized to detect and analyze the data of each part. The feature vector output from CNN full connection layer is applied as the input data of SVM classifier, and the optimal classification hyperplane is constructed in high-dimensional space for classification, so as to detect the abnormal data in diabetes data more pertinently. Finally, a real diabetes data set collected by a hospital is used for experiment, and ROC curve is adopted to evaluate the performance of the proposed approach compared with random forest algorithm, KNN algorithm and SVM algorithm. The results show that the HCSVM algorithm combined with hierarchical clustering and SVM can achieve a better performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have