An Oversampling Algorithm combining SMOTE and RF for Imbalanced Medical Data

Najiya Sigeef

doi:10.22214/ijraset.2023.54074

Abstract

Abstract: A dataset is said to be imbalanced when the distribution of classes is unbalanced. In medical datasets, this is a typical issue because the positive class, such as the existence of an illness, is frequently rarer than the negative class, such as the absence of sickness. Since machine learning algorithms are prone to being biased in favor of the majority class, this might negatively affect their performance. An oversampling algorithm that combines the SMOTE and KNN SMOTE methods has been presented as a solution to this issue. Synthetic minority oversampling, a well-known oversampling approach, interpolates between existing samples to produce synthetic samples for the minority class. The KNN, on the other hand, chooses K's nearest neighbors, mixes them, and produces synthetic samples in space. The oversampling algorithm can be used to balance the dataset, and then the random forest decision tree approach can be used to create a classification model. The random forest algorithm chooses random samples from a set of data and builds a decision tree for each training set of data. Voting is used to determine the decision tree's average, and the prediction result with the highest number of votes is chosen as the final prediction outcome.

Full Text