Applying Different Resampling Strategies In Random Forest Algorithm To Predict Lumpy Skin Disease

Suparyati Suparyati,Emma Utami Emma Utami,Alva Hendi Muhammad Alva Hendi Muhammad

doi:10.29207/resti.v6i4.4147

Suparyati Suparyati, Emma Utami Emma Utami + Show 1 more

Open Access

https://doi.org/10.29207/resti.v6i4.4147

Copy DOI

Abstract

The spread of Lumpy Skin Disease (LSD) that infects livestock is increasingly widespread in various parts of the world. Early detection of the disease’s spread is necessary so that the economic losses caused by LSD are not higher. The use of machine learning algorithms to predict the presence of a disease has been carried out, including in the field of animal health. The study aims to predict the presence of LSD in an area by utilizing the LSD dataset obtained from Mendeley Data. The number of lumpy infected cases is so low that it creates imbalanced data, posing a challenge in training machine learning models. Handling the unbalanced data is performed by sampling technique using the Random Under-sampling technique and Synthetic Minority Oversampling Technique (SMOTE). The Random Forest classification model was trained on sample data to predict cases of lumpy infection. The Random Forest classifier performs very well on both under-sampling and oversampling data. Measurement of performance metrics shows that SMOTE has a superior score of 1-2% compared to the use of Random Undersampling. Furthermore, Re-call rate, which is the metric we want to maximize in identifying lumpy cases, is superior when using SMOTE and has slightly better precision than Random Undersampling. This research only focuses on how to balance unbalanced data classes so that the optimization of the model has not been implemented, which creates opportunities for further research in the future.

Full Text