Detecting key genes relative expression orderings as biomarkers for machine learning-based intelligent screening and analysis of type 2 diabetes mellitus

Xueqin Xie,Changchun Wu,Caiyi Ma,Dong Gao,Wei Su,Jian Huang,Kejun Deng,Dan Yan,Hao Lin

doi:10.1016/j.eswa.2024.124702

Abstract

Type 2 diabetes (T2D), a prevalent chronic metabolic disease of public health concern, is growing rapidly worldwide. In order to characterize robust transcriptomic differences between T2D and the healthy pancreatic islets, we proposed a novel data analysis approach based on the within-sample relative expression orderings of genes (REOs) and machine learning, which can efficiently integrate data from diverse gene expression datasets and help discover some potential signatures. We first identified the overlapped disease-specific reverse REOs from three bulk islets expression profiles that are substantially distinct. Subsequently, the minimum-Redundancy-Maximum-Relevance (mRMR) combined with incremental feature selection (IFS) strategy was applied to select the optimal reverse REOs by using the support vector machine (SVM). As a result, 7 optimal reverse REOs encompassing 12 genes were found, which could produce encouraging predictive performance on training data, independent testing, and external islets data, showcasing robust prediction capability and superior generalization despite the heterogeneity of data. Furthermore, the analysis of differential expression gene (DEG) in bulk islets data and pseudobulk method in single-cell RNA sequencing (scRNA-seq) islets data reaffirmed the crucial roles of these 7 REOs in islets function and disease progression. In conclusion, these REOs are robust biomarkers in T2D and may also be potential therapeutic targets for T2D. We hope these findings can help elucidate the underlying pathogenic mechanisms of T2D and contribute to the treatment of diabetes. The source code is available at https://github.com/xiexq007/T2D-REO.

Full Text