Abstract

Abstract: The study of smoking behavior, a topic extensively researched over time, has presented difficulties in accurately predicting and thoroughly analyzing its determinants. Previous studies struggled to predict smoking behavior accurately, mainly due to the presence of continuous target variables, hindering the application of vital feature selection methods like mutual information. This research aims to address these challenges through an innovative approach that incorporates data preprocessing, feature engineering, and advanced machine learning techniques. To overcome the issue of continuous target variables, our methodology involves categorizing smoking behavior into discrete groups, allowing the use of feature selection methods such as mutual information scores. Logistic regression, Gaussian Naive Bayes, and Random Forest Classifier models are employed in this study to achieve highly accurate predictions of smoking behavior. The Select KBest method is utilized to assess the significance of features based on mutual information scores. The research explores various health indicators, including BMI, haemoglobin levels, and cholesterol, providing comprehensive insights into their impact on smoking behavior. The principal component analysis, or PCA, is another technique used to lower multiplicity. while retaining essential information from the dataset. Through this innovative approach and a rigorous commitment to ethical data collection practices, our goal is to advance the understanding of smoking behavior, overcoming previous challenges, and offering valuable insights for public health initiatives and smoking cessation efforts. The study evaluates results using specified algorithms and parameters, presenting a comparative analysis to enhance the clarity and robustness of our findings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call