Machine learning approaches to predict lupus disease activity from gene expression data

Brian Kegerreis,Adam C Labonte,Keith A Crandall,Michelle D Catalina,Nicholas S Geraci,Prathyusha Bachali,Amrie C Grammer,Chen Zeng,Peter E Lipsky,Nathaniel Stearrett

doi:10.1038/s41598-019-45989-0

Brian Kegerreis, Adam C Labonte + Show 8 more

Open Access

https://doi.org/10.1038/s41598-019-45989-0

Copy DOI

Abstract

The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learning approaches to integrate gene expression data from three SLE data sets and used it to classify patients as having active or inactive disease as characterized by standard clinical composite outcome measures. Both raw whole blood gene expression data and informative gene modules generated by Weighted Gene Co-expression Network Analysis from purified leukocyte populations were employed with various classification algorithms. Classifiers were evaluated by 10-fold cross-validation across three combined data sets or by training and testing in independent data sets, the latter of which amplified the effects of technical variation. A random forest classifier achieved a peak classification accuracy of 83 percent under 10-fold cross-validation, but its performance could be severely affected by technical variation among data sets. The use of gene modules rather than raw gene expression was more robust, achieving classification accuracies of approximately 70 percent regardless of how the training and testing sets were formed. Fine-tuning the algorithms and parameter sets may generate sufficient accuracy to be informative as a standalone estimate of disease activity.

Highlights

In systemic lupus erythematosus (SLE), defects in central and peripheral tolerance allow for activation of self-reactive B cell clones and differentiation into plasmablasts/plasma cells (PCs) that secrete autoantibodies, which in turn mediate tissue damage[1,2]
Gene expression values provide high accuracy when performing 10-fold cross-validation but are rendered nearly useless when performing study-based cross-validation. These results indicate that disease activity classification based on raw gene expression, while more accurate, is sensitive to technical variability, whereas classification based on module enrichment better copes with variation among data sets
We demonstrated that Differential expression (DE) analysis of active versus inactive patients is insufficient for proper classification of SLE disease activity, as systematic differences between data sets render conventional bioinformatics techniques largely non-generalizable

Summary

Introduction

In SLE, defects in central and peripheral tolerance allow for activation of self-reactive B cell clones and differentiation into plasmablasts/plasma cells (PCs) that secrete autoantibodies, which in turn mediate tissue damage[1,2]. Jourde-Chiche et al reported a discrete group of differentially expressed genes that might be found in subjects with SLE renal disease[28], and Banchereau et al extensively analyzed pediatric lupus samples and attempted to associate modules of expressed genes with disease manifestations in children[30]. Despite these advances, gene expression data has yet to provide an approach with sufficient predictive value to utilize in decision making about individual subjects with SLE. When applied to high-throughput transcriptomic data, machine learning algorithms could potentially be used to identify the gene expression features with the most utility to identify subjects with higher degrees of disease activity and may provide insights into disease pathogenesis

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jul 3, 2019
Citations: 64	License type: open-access

R Discovery Prime

R Discovery Prime

Machine learning approaches to predict lupus disease activity from gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Identification of potential blood biomarkers for Parkinson\u2019s disease by gene expression and DNA methylation data integration analysis
Changliang Wang ... Menglei Zhang
Clinical Epigenetics | VOL. 11
Changliang Wang, et. al.Changliang Wang ... Menglei Zhang
11 Feb 2019
Clinical Epigenetics | VOL. 11

Bioinformatics analyses of combined databases identify shared differentially expressed genes in cancer and autoimmune disease
Yuan Sui ... Shuping Li
Journal of Translational Medicine | VOL. 21
Yuan Sui, et. al.Yuan Sui ... Shuping Li
10 Feb 2023
Journal of Translational Medicine | VOL. 21

Genomic Heterogeneity in B-Cell Malignancies,
Daphne R Friedman ... Joseph R Nevins
Blood | VOL. 118
Daphne R Friedman, et. al.Daphne R Friedman ... Joseph R Nevins
18 Nov 2011
Blood | VOL. 118

Analyzing High-Dimensional Gene Expression and DNA Methylation Data with R
Hongmei Zhang
-
Hongmei ZhangHongmei Zhang
14 May 2020
14 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning approaches to predict lupus disease activity from gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports