Bean seed classification is critical in determining the quality of beans. Previously, the same dataset was tested using the MLP, SVM, KNN, and DT algorithms, with SVM producing the best results. The purpose of this study is to determine the most effective model through the use of the BoxCox transformation selection feature and the random forest (RF) algorithm, as well as the gradient boosting machine (GBM), light GBM, and repeated k-folds evaluation model. The bean dataset is available on the UCI Repository website. The BoxCox transformation and repeated k-folds improved the classification prediction's accuracy. The model is used in the optimal training phase for a random forest with decision tree parameters 50 and depth 10, a gradient boosting machine model with a learning rate of 1, and a light gradient boosting machine model with a learning rate of 0.5 and estimator of 500. The best training accuracy results are obtained with light GBM. which is 99 percent accurate, but only 91 percent accurate in terms of validation. According research, the Barbunya, Bombay, Cali, Dermason, Horoz, Seker, and Sira beans classes provided accuracy values of 91 percent, 100 percent, 92 percent, 92 percent, 95 percent, 94 percent, and 84 percent, respectively.
Read full abstract