Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches.

Yi Xu,Bin Zhang,Yan Wang,Qiang Liu,Jennie Z Ma,Lanjuan Li,Thomas J Payne,Ming D Li,Yunlong Ma,Liyu Cao,Xinyi Zhao,Yinghao Yao,Ying Mao

doi:10.3389/fpsyt.2020.00416

Abstract

Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic profiles. Although previous studies have identified many susceptibility variants for smoking, they have limited power to predict smoking behavior. We applied the support vector machine (SVM) and random forest (RF) methods to build prediction models for smoking behavior. We first used 1,431 smokers and 1,503 non-smokers of African origin for model building with a 10-fold cross-validation and then tested the prediction models on an independent dataset consisting of 213 smokers and 224 non-smokers. The SVM model with 500 top single nucleotide polymorphisms (SNPs) selected using logistic regression (p<0.01) as the feature selection method achieved an area under the curve (AUC) of 0.691, 0.721, and 0.720 for the training, test, and independent test samples, respectively. The RF model with 500 top SNPs selected using logistic regression (p<0.01) achieved AUCs of 0.671, 0.665, and 0.667 for the training, test, and independent test samples, respectively. Finally, we used the combined logistic (p<0.01) and LASSO (λ=10−3) regression to select features and the SVM algorithm for model building. The SVM model with 500 top SNPs achieved AUCs of 0.756, 0.776, and 0.897 for the training, test, and independent test samples, respectively. We conclude that machine learning methods are promising means to build predictive models for smoking.

Highlights

Tobacco smoking is one of the most important public health problems throughout the world [1]
After completing all machine learning processes, we found that the support vector machine (SVM) model with the combined feature selection approach of both logistic regression and least absolute shrinkage and selection operator (LASSO) regression appeared to be better than the models using only one method for both the test and independent test samples regardless of the number of single nucleotide polymorphisms (SNPs) included in each model (Table 4)
Given the results obtained from this series of parameter selections and machine learning methods, we concluded that the SVM model with the combined logistic regression (P < 0.01) and LASSO regression (l = 10−3) as the feature selection method represented the best approach of developing our prediction model for the datasets used in this study

Summary

Introduction

Tobacco smoking is one of the most important public health problems throughout the world [1]. According to a World Health Organization report, the number of deaths caused by tobacco smoking will reach 10 million worldwide annually by 2020 [2]. Without significant efforts to limit tobacco smoking, this number will rise to 8.3 million by 2030 [3]. Prevention of smoking initiation has become a critical step in tobacco control [4,5,6,7]. Stopping individuals susceptible to nicotine dependence from starting to smoke represents an effective way to achieve tobacco control. Tobacco smoking is a complex and multifactorial behavior determined by both genetic and environmental factors, as well as by gene-by-gene and gene-by-environmental interactions [8, 9]. It is feasible to predict an individual's inherited predisposition to smoking on the basis of the genomic profile

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Psychiatry	Publication Date: May 14, 2020
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Psychiatry

Lead the way for us

Similar Papers

Comparative study of different machine learning models in landslide susceptibility assessment: A case study of Conghua District, Guangzhou, China
Ao Zhang ... Yi-Yong Li
China Geology | VOL. 7
Ao Zhang, et. al.Ao Zhang ... Yi-Yong Li
06 Feb 2024
China Geology | VOL. 7

Comparing classical statistic and machine learning models in landslide susceptibility mapping in Ardanuc (Artvin), Turkey
Halil Akinci ... Mustafa Zeybek
Natural Hazards | VOL. 108
Halil Akinci, et. al.Halil Akinci ... Mustafa Zeybek
19 Apr 2021
Natural Hazards | VOL. 108

Optimizing the Predictive Ability of Machine Learning Methods for Landslide Susceptibility Mapping Using SMOTE for Lishui City in Zhejiang Province, China
Yumiao Wang ... Zhangjian Chen
International Journal of Environmental Research and Public Health | VOL. 16
Yumiao Wang, et. al.Yumiao Wang ... Zhangjian Chen
28 Jan 2019
International Journal of Environmental Research and Public Health | VOL. 16

Comparison of different machine learning classification models for predicting deep vein thrombosis in lower extremity fractures
Conghui Wei ... Jun Luo
Scientific Reports | VOL. 14
Conghui Wei, et. al.Conghui Wei ... Jun Luo
22 Mar 2024
Scientific Reports | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Psychiatry