Abstract

Abstract: A dataset is said to be imbalanced when the distribution of classes is unbalanced. In medical datasets, this is a typical issue because the positive class, such as the existence of an illness, is frequently rarer than the negative class, such as the absence of sickness. Since machine learning algorithms are prone to being biased in favor of the majority class, this might negatively affect their performance. An oversampling algorithm that combines the SMOTE and KNN SMOTE methods has been presented as a solution to this issue. Synthetic minority oversampling, a well-known oversampling approach, interpolates between existing samples to produce synthetic samples for the minority class. The KNN, on the other hand, chooses K's nearest neighbors, mixes them, and produces synthetic samples in space. The oversampling algorithm can be used to balance the dataset, and then the random forest decision tree approach can be used to create a classification model. The random forest algorithm chooses random samples from a set of data and builds a decision tree for each training set of data. Voting is used to determine the decision tree's average, and the prediction result with the highest number of votes is chosen as the final prediction outcome.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.