Metode Random Forest untuk Klasifikasi Penyakit Diabetes

Dhea Agustina Hadi,Dwi Agustin Nuriani Sirodj

doi:10.29313/bcss.v3i2.8354

Abstract

Abstract. Random Forest is a supervised learning algorithm developed from decision trees with the application of boostrap aggregating (bagging). This method grows trees from decision trees to produce a forest or the best model called the random forest model. Tree growth is done with randomly selected data with returns through the bagging process. Random forest is considered to provide better performance results for diabetes data among other supervised learning methods, because random forest and has the lowest error rate compared to other methods. Random forest is also an important technique for medical data classification, especially for diagnosing diabetics. In this study, classification was carried out using Pima Indian Diabetes data, which is an American tribe that lives in Arizona and Mexico. Classification analysis was carried out using an algorithm to see the level of accuracy in random forest classification on Pima Indian diabetes data. The results show that the accuracy value of random forest classification is 74.78%, this value is in the accuracy category at the fair classification level. In this random forest classification, there are three main variables that become importance variables, namely glucose then BMI, and age. Abstract. Random Forest is a supervised learning algorithm developed from decision trees with the application of boostrap aggregating (bagging). This method grows trees from decision trees to produce a forest or the best model called the random forest model. Tree growth is done with randomly selected data with returns through the bagging process. Random forest is considered to provide better performance results for diabetes data among other supervised learning methods, because random forest and has the lowest error rate compared to other methods. Random forest is also an important technique for medical data classification, especially for diagnosing diabetics. In this study, classification was carried out using Pima Indian Diabetes data, which is an American tribe that lives in Arizona and Mexico. Classification analysis was carried out using an algorithm to see the level of accuracy in random forest classification on Pima Indian diabetes data. The results show that the accuracy value of random forest classification is 74.78%, this value is in the accuracy category at the fair classification level. In this random forest classification, there are three main variables that become importance variables, namely glucose then BMI, and age.

Full Text