Dry beans are the most widely grown edible legume crop worldwide, with high genetic diversity. Crop production is strongly influenced by seed quality. So, seed classification is important for both marketing and production because it helps build sustainable farming systems. The major contribution of this research is to develop a multiclass classification model using machine learning (ML) algorithms to classify the seven varieties of dry beans. The balanced dataset was created using the random undersampling method to avoid classification bias of ML algorithms towards the majority group caused by the unbalanced multiclass dataset. The dataset from the UCI ML repository is utilised for developing the multiclass classification model, and the dataset includes the features of seven distinct varieties of dried beans. To address the skewness of the dataset, a Box-Cox transformation (BCT) was performed on the dataset’s attributes. The 22 ML classification algorithms have been applied to the balanced and preprocessed dataset to identify the best ML algorithm. The ML algorithm results have been validated with a 10-fold cross-validation approach, and during validation, the CatBoost ML algorithm achieved the highest overall mean accuracy of 93.8 percent, with a range of 92.05 percent to 95.35 percent.
Read full abstract