Abstract

Abstract: The examination of smoking habits, extensively studied over time, has posed challenges in accurately predicting and thoroughly analyzing its determinants. Previous research efforts struggled to precisely predict smoking behavior due to the presence of continuous target variables, which hindered the application of crucial feature selection techniques like mutual information. This study aims to tackle these hurdles through an innovative approach that integrates data preprocessing, feature engineering, and advanced machine learning methods. To address the issue of continuous target variables, our methodology involves categorizing smoking behavior into discrete groups, enabling the utilization of feature selection techniques such as mutual information scores. Logistic regression, Gaussian Naive Bayes, and Random Forest Classifier models are utilized to achieve highly accurate predictions of smoking behavior. The Select KBest method is employed to evaluate the importance of features based on mutual information scores. The investigation delves into various health markers, including BMI, haemoglobin levels, and cholesterol, offering comprehensive insights into their influence on smoking habits. Furthermore, Principal Component Analysis (PCA) is implemented to reduce dimensionality while preserving essential information from the dataset. Through this novel approach and a steadfast commitment to ethical data collection practices, our objective is to advance the comprehension of smoking behavior, surmounting past challenges, and providing valuable insights for public health initiatives and smoking cessation endeavors. The paper assesses outcomes using specified algorithms and parameters, presenting a comparative analysis to enhance the clarity and reliability of our findings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call