Heterogeneous Gene Expression Data Research Articles

BackgroundTranscriptomic datasets often contain undeclared heterogeneity arising from biological variation such as diversity of disease subtypes, treatment subgroups, time-series gene expression, nested experimental conditions, as well as technical variation due to batch effects, platform differences in integrated meta-analyses, etc. However, current analysis approaches are primarily designed to handle comparisons between experimental conditions represented by homogeneous samples, thus precluding the discovery of underlying subphenotypes. Unsupervised methods for subtype identification are typically based on individual gene level analysis, which often result in irreproducible gene signatures for potential subtypes. Emerging methods to study heterogeneity have been largely developed in the context of single-cell datasets containing hundreds to thousands of samples, limiting their use to select contexts.ResultsWe present a novel analysis method, SPSNet, which identifies subtype-specific gene expression signatures based on the activity of subnetworks in biological pathways. SPSNet identifies the gene subnetworks capturing the diversity of underlying biological mechanisms, indicating potential sample subphenotypes. In the presence of extrinsic or non-biological heterogeneity (e.g. batch effects), SPSNet identifies subnetworks that are particularly affected by such variation, thus helping eliminate factors irrelevant to the biology of the phenotypes under study.ConclusionUsing multiple publicly available datasets, we illustrate that SPSNet is able to consistently uncover patterns within gene expression data that correspond to meaningful heterogeneity of various origins. We also demonstrate the performance of SPSNet as a sensitive and reliable tool for understanding the structure and nature of such heterogeneity.

Read full abstract

High-throughput gene expression data are often obtained from pure or complex (heterogeneous) biological samples. In the latter case, data obtained are a mixture of different cell types and the heterogeneity imposes some difficulties in the analysis of such data. In order to make conclusions on gene expresssion data obtained from heterogeneous samples, methods such as microdissection and flow cytometry have been employed to physically separate the constituting cell types. However, these manual approaches are time consuming when measuring the responses of multiple cell types simultaneously. In addition, exposed samples, on many occasions, end up being contaminated with external perturbations and this may result in an altered yield of molecular content. In this paper, we model the heterogeneous gene expression data using a Bayesian framework, treating the cell type proportions and the cell-type specific expressions as the parameters of the model. Specifically, we present a novel sequential Monte Carlo (SMC) sampler for estimating the model parameters by approximating their posterior distributions with a set of weighted samples. The SMC framework is a robust and efficient approach where we construct a sequence of artificial target (posterior) distributions on spaces of increasing dimensions which admit the distributions of interest as marginals. The proposed algorithm is evaluated on simulated datasets and publicly available real datasets, including Affymetrix oligonucleotide arrays and national center for biotechnology information (NCBI) gene expression omnibus (GEO), with varying number of cell types. The results obtained on all datasets show a superior performance with an improved accuracy in the estimation of cell type proportions and the cell-type specific expressions, and in addition, more accurate identification of differentially expressed genes when compared to other widely known methods for blind decomposition of heterogeneous gene expression data such as Dsection and the nonnegative matrix factorization (NMF) algorithms. MATLAB implementation of the proposed SMC algorithm is available to download at https://github.com/moyanre/smcgenedeconv.git.

Read full abstract

Heterogeneous Gene Expression Data Research Articles

Articles published on Heterogeneous Gene Expression Data

YADA - Reference Free Deconvolution of RNA Sequencing Data

Unveiling Prognostic RNA Biomarkers through a Multi-Cohort Study in Colorectal Cancer.

SPSNet: subpopulation-sensitive network-based analysis of heterogeneous gene expression data

A sequential Monte Carlo approach to gene expression deconvolution.

Sensitivity analysis of cell-type specific differential expression detection in heterogeneous gene expression data

Harnessing gene expression networks to prioritize candidate epileptic encephalopathy genes.

Integrating heterogeneous gene expression data for gene regulatory network modelling

CleanEx: a database of heterogeneous gene expression data based on a consistent gene nomenclature.

Classification of heterogeneous gene expression data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Heterogeneous Gene Expression Data Research Articles

Articles published on Heterogeneous Gene Expression Data

YADA - Reference Free Deconvolution of RNA Sequencing Data

Unveiling Prognostic RNA Biomarkers through a Multi-Cohort Study in Colorectal Cancer.

SPSNet: subpopulation-sensitive network-based analysis of heterogeneous gene expression data

A sequential Monte Carlo approach to gene expression deconvolution.

Sensitivity analysis of cell-type specific differential expression detection in heterogeneous gene expression data

Harnessing gene expression networks to prioritize candidate epileptic encephalopathy genes.

Integrating heterogeneous gene expression data for gene regulatory network modelling

CleanEx: a database of heterogeneous gene expression data based on a consistent gene nomenclature.

Classification of heterogeneous gene expression data