Abstract

This study aims to analyze the smoking behaviour of people aged 15 and older in Turkey using supervised and unsupervised machine learning methods. In this study, C4.5 and Random Forest (RF) were trained to predict smoking behaviour, and an apriori algorithm was used to detect associations. Sensitivity, specificity, accuracy, positive predicted value (PPV), and f-measure were used to compare the performances of the supervised models. The Turkey Health Interview Survey 2019 was used with a sample size of 17084 to predict smoking behaviour and determine the factors affecting smoking. Data analysis and performance evaluation were performed with R programming language by RStudio. By association rules, gender, age, and alcohol consumption are the most representative attributes of smoking behaviour. Associations were determined on smoking, non-smoking and quit-smoking behaviour. Also, it has been seen that the RF algorithm has better results than the C4.5 algorithm. It’s preferred to use the RF model, which had better performance with an accuracy of 0.909, a specificity of 0.965, a sensitivity of 0.782, a PPV of 0.908, and an f-measure of 0.840 for predicting smoking behaviour. This study contributes to the literature covering the most comprehensive national health survey data and using machine learning methods on this data in Turkey. Also, it indicates that machine learning methods can be used to analyze such datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call