Abstract

We introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.

Highlights

  • Almost all bulk tissue samples are composed of multiple cell types

  • The input data of EMeth include the methylation data from bulk tissue samples and the reference data of cell type-specific DNA methylation in a pre-defined set of CpGs. These CpGs can be divided into two groups: the consistent CpGs on which the DNA methylation in the bulk sample is consistent with what is expected from the deconvolution model

  • We studied immune cell type composition for four cancer types using the gene expression data and DNA methylation data (Illumina 450k array) from The Cancer Genome Atlas (TCGA)

Read more

Summary

Results

EMeth models the DNA methylation data in a tissue sample by a normal mixture of regression model with two components, designed for consistent and aberrant CpGs, respectively. We evaluated different methods by directly mixing individual-specific and cell type-specific DNA methylation data to generate pseudo tissue samples. This approach captures the variation of cell type-specific DNA methylation across individuals and does not rely on any distribution assumption. Both EMeth and RLS have accurate estimation results (correlation with true cell type proportions as high as 0.95 and RMSE around 10−3 for each cell type) and they consistently outperform other methods (Fig. 2). To estimate cell type proportions using gene expression data, we applied C­ IBERSORTx26 using its default

G G GGG G GG
G GG G G G
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.