Abstract

Complex tissues are composed of a large number of different types of cells, each involved in a multitude of biological processes. Consequently, an important component to understanding such processes is understanding the cell-type composition of the tissues. Estimating cell-type composition using high-throughput gene expression data is known as cell-type deconvolution. In this paper we first summarize the extensive deconvolution literature by identifying a common regression-like approach to deconvolution. We call this approach the unified deconvolution-as-regression (UDAR) framework. While methods that fall under this framework all use a similar model, they fit using data on different scales. Two popular scales for gene expression data are logarithmic and linear. Unfortunately, each of these scales has problems in the UDAR framework. Using log-scale gene expressions proposes a biologically implausible model and using linear-scale gene expressions will lead to statistically inefficient estimators. To explore ways to address these issues, in this paper we consider how deconvolution methods may use an adjusted model that is a hybrid of the two scales. In analysis on simulations as well as a collection of eleven real benchmark datasets, we find a prototypical hybrid-scale adjustment to the UDAR framework improves statistical efficiency and robustness. More broadly, we believe these hybrid-scale modeling principles may be incorporated into many existing deconvolution methods.

Highlights

  • The tissues of multi-cellular organisms are typically comprised of a combination of many types of cells

  • To evaluate the efficacy of the hybrid approach as compared to the Unified Deconvolution-as-Regression (UDAR) model we evaluate the methods on simulated heterogeneous mixtures of cells

  • R is comprised of linear scale read counts

Read more

Summary

Introduction

The tissues of multi-cellular organisms are typically comprised of a combination of many types of cells. For this reason, methods to estimate cell-type proportions from high-throughput genomics data have been extensively studied over the past two decades (for comprehensive literature reviews see Gaujoux (2013) or Mohammadi et al (2015)). Given gene expression data from sample comprised of a mixture of cell types, deconvolution methods estimate the proportions of the constituent cell types. These comprehensive analyses show that the proposed approach produces accurate and robust estimates of cell-type proportions

A Unified Framework for Existing Deconvolution Models
A Hybrid Model for Deconvolution
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.