Abstract

This research was conducted to compare the accuracy when decision tree and logistic regression methods are used on some data. Decision tree is one method of classification techniques in data mining. In the decision tree method, very large data samples will be represented as smaller rules, and logistic regression is a method that aims to determine the effect of an independent variable on other variables, namely dichotomous dependent variables. Both algorithms were written and analyzed using R software to see which method is better between the decision tree method and the logistic regression method applied to SNP (Single Nucleotide Polymorphism) genetic data, namely Asthma data. SNP Genetic Data was obtained from R software with the package name "SNPassoc" and the data name "asthma". Asthma data has 57 features, namely Country, Gender, Age, BMI, Smoke, Case control, and SNP (Single Nucleotide Polymorphism) genetic code. Comparative analysis was carried out based on the results of the accuracy values obtained in the two methods. Variations in the proportion of the test data used were 40%, 30%, 20% and 10% and were simulated 1000 times on the grounds of obtaining a better accuracy value. The results obtained show that the decision tree method obtains an accuracy value of 0.5793, 0.5777, 0.5745, 0.5526, respectively, while the logistic regression method is 0.7696, 0.7729, 0.7763, 0.7788, respectively and they are achieved at the proportion of test data of 40%, 30%, 20%, 10%. Thus it can be concluded that in this case the logistic regression method is better than the decision tree method in classifying Asthma data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call