Abstract
While our understanding of cellular and molecular processes has grown exponentially, issues related to the cell microenvironment and cellular heterogeneity have sparked a new debate concerning the cell identity. Cell composition (chromatin and nuclear architecture) poses a strong risk for dynamic changes in the diseased condition. Since chromatin accessibility patterns play a major role in human diseases, it is therefore anticipated that a deconvolution tool based on open chromatin data will provide better performance in identifying cell composition. Herein, we have designed the deconvolution tool “DeconPeaker,” which can precisely define the uniqueness among subpopulations of cells using open chromatin datasets. Using this tool, we simultaneously evaluated chromatin accessibility and gene expression datasets to estimate cell types and their respective proportions in a mixture of samples. In comparison to other known deconvolution methods, we observed the lowest average root-mean-square error (RMSE = 0.042) and the highest average correlation coefficient (r = 0.919) between the prediction and “true” proportion. As a proof-of-concept, we also tested chromatin accessibility data from acute myeloid leukemia (AML) and successfully obtained unique cell types associated with AML progression. Furthermore, we showed that chromatin accessibility represents more essential characteristics in the identification of cell types than gene expression. Taken together, DeconPeaker as a powerful tool has the potential to combine different datasets (primarily, chromatin accessibility and gene expression) and define different cell types in mixtures. The Python package of DeconPeaker is now available at https://github.com/lihuamei/DeconPeaker.
Highlights
Human diseases are multifactorial and complex processes in which genetic–epigenetic components are significantly involved
The data processing in this tool requires three main steps: (1) identification of a list of nonoverlapping cell type-specific peaks (CTSPs) with the reference samples by a hypothesis test framework, a construction of a signature matrix by minimizing the condition number; (2) deconvolution of the mixtures with the signature matrix using SIMPLS; and (3) evaluation of the deconvolution using asymptotic test for consistency of the distributions between observations and predictions
The signature matrices identified by DeconPeaker consistently showed lower average Root-mean-square error (RMSE) and the highest average Pearson correlation coefficient (PCC), indicating a greater reliability and a broader range of applications of DeconPeaker
Summary
Human diseases are multifactorial and complex processes in which genetic–epigenetic components are significantly involved. The lack of defined gene signature and biological characteristics of bulk tissues from the histological district subtypes lead to the suboptimal–mediocre results in human diseases (Amit et al, 2020). Several disease association studies have suggested the cell type composition as a confounding factor (Newman et al, 2015). Embryogenesis, morphogenesis, cell differentiation, and growth are directly associated with the changes in cell type composition (Hunt et al, 2019). In publically available databases such as The Cancer Genome Atlas (TCGA), thousands of samples have been determined. These samples were generated as a mixture from bulk sequencing. Resolving cell types and compositions from these available samples will facilitate our understanding of biological mechanisms. Adequate methods are needed to identify the correct cell types and compositions from a mixture
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.