Abstract

Bean seed classification is critical in determining the quality of beans. Previously, the same dataset was tested using the MLP, SVM, KNN, and DT algorithms, with SVM producing the best results. The purpose of this study is to determine the most effective model through the use of the BoxCox transformation selection feature and the random forest (RF) algorithm, as well as the gradient boosting machine (GBM), light GBM, and repeated k-folds evaluation model. The bean dataset is available on the UCI Repository website. The BoxCox transformation and repeated k-folds improved the classification prediction's accuracy. The model is used in the optimal training phase for a random forest with decision tree parameters 50 and depth 10, a gradient boosting machine model with a learning rate of 1, and a light gradient boosting machine model with a learning rate of 0.5 and estimator of 500. The best training accuracy results are obtained with light GBM. which is 99 percent accurate, but only 91 percent accurate in terms of validation. According research, the Barbunya, Bombay, Cali, Dermason, Horoz, Seker, and Sira beans classes provided accuracy values of 91 percent, 100 percent, 92 percent, 92 percent, 95 percent, 94 percent, and 84 percent, respectively.

Highlights

  • Bean seed classification is critical in determining the quality of beans

  • The bean dataset is available on the UCI Repository website

  • The model is used in the optimal training phase for a random forest

Read more

Summary

Pendahuluan

Penentuan klasifikasi biji-bijian merupakan faktor yang penting sekali dalam menentukan mutu biji-bijian dan telah banyak dilakukan dengan berbagai metode oleh para ahli. Metode analisis serta perhitungan pada machine learning dan image recognition kacang kering dapat diidentifikasi berdasarkan panjang, bentuk, besar, dan aspek fisik lainnya. Penelitian didapatkan bahwa GBM meningkatkan akurasi prediksi R kuadrat dan RMSE lebih dari 80 persen dibandingkan dengan model terbaik industri yakni algoritma random forest dan regresi linier [21]. Pada prediksi miRNA penderita kanker payudara, menggunakan beberapa teknik machine learning yakni XGBoost, Random Forest, dan lightGBM , diperoleh bahwa LightGBM dari beberapa aspek seperti akurasi dan kecepatan unggul dari dua teknik lainnya [26]. Pada penelitian ini melakukan komparasi terhadap akurasi prediksi pada tiga algoritma gradient boosting machine, random forest dan Light GBM menggunakan fitur seleksi BoxCox. Komparasi ini akan diuji pada klasifikasi dataset kacang kering

Dataset Kacang
Normalisasi data BoxCox
Evaluasi
Algoritma Klasifikasi
Peralatan
Korelasi Variabel
Model Training
BoxCox Repeated k-folds
Random Forest Salah satu parameter untuk optimasi dari metode Random
Light GBM
Findings
Kesimpulan
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call