Abstract

Rheumatoid arthritis (RA) is an incurable disease that afflicts 0.5–1.0% of the global population though it is less threatening at its early stage. Therefore, improved diagnostic efficiency and prognostic outcome are critical for confronting RA. Although machine learning is considered a promising technique in clinical research, its potential in verifying the biological significance of gene was not fully exploited. The performance of a machine learning model depends greatly on the features used for model training; therefore, the effectiveness of prediction might reflect the quality of input features. In the present study, we used weighted gene co-expression network analysis (WGCNA) in conjunction with differentially expressed gene (DEG) analysis to select the key genes that were highly associated with RA phenotypes based on multiple microarray datasets of RA blood samples, after which they were used as features in machine learning model validation. A total of six machine learning models were used to validate the biological significance of the key genes based on gene expression, among which five models achieved good performances [area under curve (AUC) >0.85], suggesting that our currently identified key genes are biologically significant and highly representative of genes involved in RA. Combined with other biological interpretations including Gene Ontology (GO) analysis, protein–protein interaction (PPI) network analysis, as well as inference of immune cell composition, our current study might shed a light on the in-depth study of RA diagnosis and prognosis.

Highlights

  • Rheumatoid arthritis (RA) is a long-term autoimmune disease that provokes synovial inflammation (Song and Lin, 2017) and predominantly inflicts accumulative damage on joints (Smolen et al, 2016)

  • Based on differentially expressed gene (DEG), Gene Ontology (GO) term enrichment analysis was performed; our results were mainly focused on three different GO categories, namely, biological process (BP), molecular function (MF), and cellular component (CC), along with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis

  • Key genes were obtained from intersection between DEGs in the test set and hub genes in the training set, including 8 upregulated genes and downregulated genes, among which features selected by Lasso were used in subsequent machine learning modeling

Read more

Summary

Introduction

Rheumatoid arthritis (RA) is a long-term autoimmune disease that provokes synovial inflammation (Song and Lin, 2017) and predominantly inflicts accumulative damage on joints (Smolen et al, 2016). The incidence of RA was higher in females than in males, and its pathogenesis was mainly dependent on genetic factors, especially immune-associated genes (Song and Lin, 2017). Aside from acute RA onset that immediately perturbs the immune system, most preclinical RA, whose clinical symptoms are still inconspicuous, could be abrogated through customized interventions, and the resulting establishment of RA could be prevented (Smolen et al, 2016). The heterogeneity of RA is characterized by its clinical symptoms and pathogenesis that vary across patients who receive the same diagnosis (Smolen et al, 2016). There is a fundamental need for understanding the molecular mechanism underlying the heterogeneous RA to improve both diagnostic and prognostic outcomes

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.