Abstract

Raisin grains are among the agricultural commodities that can benefit health. The production of raisin grains needs to be classified to achieve optimal results. In this case, the classification is carried out on two types of grains, namely Kecimen and Besni. However, inaccurate sample data can affect the performance of the model. In this study, two sampling techniques are proposed: stratified and shuffled. The proposed classification model is RF, GBT, NB, LR, and NN. This study aims to identify the performance of classification models based on sampling techniques. Classification models are applied to the seven-features dataset, and modeling is done by cross-validation. The results of the models were tested with a different amount of test data. The performance of the models was evaluated related to accuracy and AUC. The best outcomes of all models based on stratified sampling were founded on tested data of 40 percent with a mean accuracy of 85.50% and an AUC of 0.921. In comparison, models based on shuffled sampling were founded on test data of 20 percent with a mean accuracy of 88.11% and an AUC of 0.935. On the other hand, classification models based on a stratified sampling of all data splits do not all models generate an excellent category. Whereas, based on shuffled sampling, all models resulted in the excellent category. Therefore, models based on shuffled sampling are superior to stratified sampling. The result of the significant test, RF, significantly differs based on sampling techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.