This research will classify numerical data, namely loan data taken from Kaggle. The data used amounted to 9578 datasets which included data classes with borrowers able to complete credit as many as 8045 records and loans that could not complete credit as many as 1533 records. From the amount of data there is an imbalance of classes so it is necessary to do balancing in order to get more accurate classification results. The purpose of this research is to improve the accuracy of the Naïve Bayes algorithm in classifying numerical data. Fraud in financial transactions is an example of a case of imbalanced data, where the number of legitimate transactions is much greater than those that are fraudulent. Optimizing accuracy in minority (fraud) classes is very important to avoid losses. The method used to improve the accuracy of the algorithm is the Synthetic Minority Oversampling Technique (SMOTE) by over sampling the minority of the dataset. In addition, it also uses the K-Fold Cross Validation method to evaluate the performance of the algorithm process used. Data preprocessing is done to clean the data from missing and invalid values and normalize the data so that all features are on the same scale and suitable for classification analysis. Based on the results of the analysis conducted, before the application of SMOTE the model's ability to recognize minority classes was 16.1%, while after the application of SMOTE the model's ability to recognize minority classes became 48.8%. besides that, before the application of SMOTE the model was able to predict the minority class correctly in 10 cases while after the application of SMOTE, the model was able to predict the minority class correctly in 102 cases. So it can be concluded that the SMOTE technique is able to improve the ability of the model
Read full abstract