Genetic Risk Score Increased Discriminant Efficiency of Predictive Models for Type 2 Diabetes Mellitus Using Machine Learning: Cohort Study.

Yikang Wang,Miaomiao Niu,Zhenfei Wang,Ruiying Li,Liying Zhang,Runqi Tu,Xiaotian Liu,Jian Hou,Zhenxing Mao,Chongjian Wang

doi:10.3389/fpubh.2021.606711

Yikang Wang, Miaomiao Niu + Show 8 more

Open Access

https://doi.org/10.3389/fpubh.2021.606711

Copy DOI

Journal: Frontiers in public health	Publication Date: Feb 17, 2021
Citations: 10	License type: CC BY 4.0

Affiliation: Zhengzhou University

Abstract

Background: Previous studies have constructed prediction models for type 2 diabetes mellitus (T2DM), but machine learning was rarely used and few focused on genetic prediction. This study aimed to establish an effective T2DM prediction tool and to further explore the potential of genetic risk scores (GRS) via various classifiers among rural adults.Methods: In this prospective study, the GRS for a total of 5,712 participants from the Henan Rural Cohort Study was calculated. Cox proportional hazards (CPH) regression was used to analyze the associations between GRS and T2DM. CPH, artificial neural network (ANN), random forest (RF), and gradient boosting machine (GBM) were used to establish prediction models, respectively. The area under the receiver operating characteristic curve (AUC) and net reclassification index (NRI) were used to assess the discrimination ability of the models. The decision curve was plotted to determine the clinical-utility for prediction models.Results: Compared with the individuals in the lowest quintile of the GRS, the HR (95% CI) was 2.06 (1.40 to 3.03) for those with the highest quintile of GRS (P trend < 0.05). Based on conventional predictors, the AUCs of the prediction model were 0.815, 0.816, 0.843, and 0.851 via CPH, ANN, RF, and GBM, respectively. Changes with the integration of GRS for CPH, ANN, RF, and GBM were 0.001, 0.002, 0.018, and 0.033, respectively. The reclassifications were significantly improved for all classifiers when adding GRS (NRI: 41.2% for CPH; 41.0% for ANN; 46.4% for ANN; 45.1% for GBM). Decision curve analysis indicated the clinical benefits of model combined GRS.Conclusion: The prediction model combined with GRS may provide incremental predictions of performance beyond conventional factors for T2DM, which demonstrated the potential clinical use of genetic markers to screen vulnerable populations.Clinical Trial Registration: The Henan Rural Cohort Study is registered in the Chinese Clinical Trial Register (Registration number: ChiCTR-OOC-15006699). http://www.chictr.org.cn/showproj.aspx?proj=11375.

Highlights

IntroductionWe focused on participants who were available for the known outcome, complete predictors, and genotype data
For this study, we focused on participants who were available for the known outcome, complete predictors, and genotype data.This resulted in 5,712 individuals after the exclusion of prevalentType 2 diabetes mellitus (T2DM) cases at baseline (n = 764), unknown incident T2DM (n = 686), and incomplete epidemiology and genotype data (n = 1,106)
Previous studies have constructed prediction models for type 2 diabetes mellitus (T2DM), but machine learning was rarely used and few focused on genetic prediction

Summary

Introduction

We focused on participants who were available for the known outcome, complete predictors, and genotype data. This resulted in 5,712 individuals after the exclusion of prevalent. 3,998) and test datasets (30%, n = 1,714) to establish models and evaluate prediction performance, respectively. Type 2 diabetes mellitus (T2DM) is a global health threat [1]. Previous studies have constructed prediction models for type 2 diabetes mellitus (T2DM), but machine learning was rarely used and few focused on genetic prediction. This study aimed to establish an effective T2DM prediction tool and to further explore the potential of genetic risk scores (GRS) via various classifiers among rural adults

Objectives

Methods

Results

Discussion

Conclusion