Abstract

Objective: This paper presented a) how the Global Adult Tobacco Surveys (GATSs) data can be used for extracting valuable information about tobacco use behaviors of people and b) the prediction performance of the implemented classification algorithms on the GATS data. Methods: Three well-known classification methods: K-nearest neighbor, C4.5 algorithm, and multilayer perceptron were applied to assess the classifying performance for the smoking status of GATS participants (pre-defined classes: smoker and no smoker) based on the socio-demographic characteristics (age group, gender, residence, education level, and working status). The first analysis was performed on the GATS data from Turkey. Subsequently, the model producing the best performance for Turkey was also implemented for other six European countries: Greece, Kazakhstan, Poland, Romania, Russia, and Ukraine. Results: All of the tree algorithms were more confident to classify no smokers. The correct classification rate of C4.5 algorithm was the highest among the algorithms for the GATS Turkey data. In addition, the C4.5 algorithm classified the males more detailed than the females. The comparative analysis indicated that the C4.5 algorithm correctly classified the smoking status of participants of Ukraine over 80% while it was lower than 70% for Greece. Thus, the effects of demographic factors on smoking status can change from one country to another. Conclusion: This paper indicated that the data supplied by GATS such as demographic data may help to compute the likelihood of an individual to be a smoker in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call