Abstract

BackgroundFor heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.ResultsExperimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach.Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.ConclusionsThe deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.

Highlights

  • For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved

  • For studies involving heterogeneous tissue samples, detection of differential gene expression from molecular profiles, as well as identification of biomarkers is a problem of validity: molecular profile variation and changes in cell type proportions between tissue samples are confounded [1,2,3,4]

  • As the experimental data offered gene expression profiles for whole blood, i.e. a heterogeneous tissue which is a mixture of several cell types, and in addition the gene expression profiles from CD3+ cells of the same samples, and the respective CD3+ proportions, we were able to use this information as a basis for a validation study for the proposed deconfounding algorithm

Read more

Summary

Introduction

For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues. For studies involving heterogeneous tissue samples, detection of differential gene expression from molecular profiles, as well as identification of biomarkers is a problem of validity: molecular profile variation and changes in cell type proportions between tissue samples are confounded [1,2,3,4].

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call