Abstract The accuracy of current deconvolution methods largely relies on the quality of cell-type expression references. However, single-cell (sc) and single-nuclei (sn) RNA-seq data used for building the reference are usually generated from independent studies that are distinct from the bulk RNA-seq data to be deconvolved. This study design inherently introduces technical confounding factors as unwanted variations, which is not fully addressed by current methods. To evaluate the impact of this variation on deconvolution accuracy, we generated a benchmark dataset where bulk and snRNA-seq profiling were performed from the same aliquot of single-nuclei that were extracted from 24 healthy retina samples. All donor eye samples were collected within six hours post-mortem and were absent of any disease. This study design guarantees the matched sequencing data to present the same cell-type compositions, so that cross-platform technical artifacts become the remaining confounding factor. We used the benchmark dataset to evaluate the performance of seven current deconvolution methods and found they performed much worse in matched real-bulk data than in matched pseudo-bulks that were summations of the single-cell data. This finding suggests that none of these methods have fully addressed the major technical artifacts between bulk and single-cell sequencing platforms. We therefore propose DeMix.SC, a new deconvolution framework that optimizes deconvolution parameters using a small set of matched bulk and sc/snRNA-seq data from the same tissue type. DeMix.SC includes two major steps. First, we measure the technical variations across genes and across platforms using the benchmark data. Second, we introduce a new weight function for each gene that produces a ranking order that accounts for both the platform-specific technical variations and cell-type specific expressions at gene level. Using the benchmark data for retina, we applied DeMix.SC to previously published human retinal RNA-seq data from 523 individuals with different stages of age-related macular degeneration (AMD). We observed that DeMix.SC can accurately capture the cell-type composition shifts in the AMD retina. DeMix.SC revealed a significant drop of rod cells as well as increased astrocytes, bipolar cells, and Müller cells in the AMD retina compared to the non-AMD group. The proportion changes of the later three minor cell types were not identified by other methods, while DeMix.SC could reveal such tendency. In summary, DeMix.SC integrates benchmark data to improve the deconvolution accuracy in retina samples. Our method is generic and can be applied to other disease conditions, such as deciphering the cell-type heterogeneity in cancer. We expect DeMix.SC will help revolutionize the downstream cell-type specific analysis of bulk RNA-seq data and identify cellular targets of human diseases. Citation Format: Shuai Guo, Xuesen Cheng, Andrew Koval, Shuangxi Ji, Qingnan Liang, Yumei Li, Leah A. Owen, Ivana K. Kim, John Weinstein, Scott Kopetz, John Paul Shen, Margaret M. DeAngelis, Rui Chen, Wenyi Wang. Integration with benchmark data of paired bulk and single-cell RNA sequencing data substantially improves the accuracy of bulk tissue deconvolution. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4273.
Read full abstract