Abstract

The rapid development of information technology has led to the development of medical informatization in the direction of intelligence. Medical health big data provides a basic data resource guarantee for medical service intelligence and smart healthcare. The classification of medical health big data is of great significance for the intelligentization of medical information. Due to the simplicity of KNN (K-Nearest Neighbor) classification algorithm, it has been widely used in many fields. However, when the sample size is large and the feature attributes are large, the efficiency of the KNN algorithm classification will be greatly reduced. This paper proposes an improved KNN algorithm and compares it with the traditional KNN algorithm. The classification is performed in the query instance neighborhood of the conventional KNN classifier, and weights are assigned to each class. The algorithm considers the class distribution around the query instance to ensure that the assigned weight does not adversely affect the outliers. Aiming at the shortcomings of traditional KNN algorithm in processing large data sets, this paper proposes an improved KNN algorithm based on cluster denoising and density cropping. The algorithm performs denoising processing by clustering, and improves the classification efficiency of KNN algorithm by speeding up the search speed of K-nearest neighbors, while maintaining the classification accuracy of KNN algorithm. The experimental results show that the proposed algorithm can effectively improve the classification efficiency of KNN algorithm in processing large data sets, and maintain the classification accuracy of KNN algorithm well, and has good classification performance.

Highlights

  • The development of information technology has made digital medical technology more mature, medical data is growing at an unprecedented rate, and biomedical research has developed into a typical data-intensive science, forming a data explosion phenomenon called ‘‘big data.’’ In the era of big data, data has become a new strategic resource, an important factor driving innovation, and is changing the way of biomedical research and the way of life and thinking of human beings

  • The results show that the classification speed is improved and the classification accuracy is guaranteed

  • If the weighting factors studied in this paper have invariant values for each category of dataset, ie they are not dependent on the query instance and are beneficial to a few classes, the algorithm under study will exist in the correct cluster, but there is no correct classification

Read more

Summary

INTRODUCTION

The development of information technology has made digital medical technology more mature, medical data is growing at an unprecedented rate, and biomedical research has developed into a typical data-intensive science, forming a data explosion phenomenon called ‘‘big data.’’ In the era of big data, data has become a new strategic resource, an important factor driving innovation, and is changing the way of biomedical research and the way of life and thinking of human beings. For a given query instance xt, the modified KNN rule can be formally expressed as follows: yt = arg max W [c] ∗ E(yi, c) This weighting factor has major drawbacks and does not perform the same on most data sets. If k is the number of neighbors used by the existing KNN algorithm to determine the query instance class, in this design, this paper considers the class distribution in the k + d nearest neighbors of the query instance. The classification results of the KNN 6 before and after the improvement are significantly better than those before the improvement

IMPROVED KNN BASED ON FEATURE WEIGHT CORRECTION
EXPERIMENTAL RESEARCH
DENSITY CROPPING BASED ON CLUSTER DENOISING IN KNN ALGORITHM
Findings
EVALUATION AND DISCUSSION OF EXPERIMENTAL RESULTS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.