Abstract

Characterizing and interpreting heterogeneous mixtures at the cellular level is a critical problem in genomics. Single-cell assays offer an opportunity to resolve cellular level heterogeneity, e.g., scRNA-seq enables single-cell expression profiling, and scATAC-seq identifies active regulatory elements. Furthermore, while scHi-C can measure the chromatin contacts (i.e., loops) between active regulatory elements to target genes in single cells, bulk HiChIP can measure such contacts in a higher resolution. In this work, we introduce DC3 (De-Convolution and Coupled-Clustering) as a method for the joint analysis of various bulk and single-cell data such as HiChIP, RNA-seq and ATAC-seq from the same heterogeneous cell population. DC3 can simultaneously identify distinct subpopulations, assign single cells to the subpopulations (i.e., clustering) and de-convolve the bulk data into subpopulation-specific data. The subpopulation-specific profiles of gene expression, chromatin accessibility and enhancer-promoter contact obtained by DC3 provide a comprehensive characterization of the gene regulatory system in each subpopulation.

Highlights

  • Characterizing and interpreting heterogeneous mixtures at the cellular level is a critical problem in genomics

  • We examined data from various cell lines and found that HiChIP loop counts are generally positively correlated with both gene expression values from RNA-seq (Supplementary Fig. 1) and enhancer openness from ATAC-seq (Supplementary Fig. 2)

  • We developed DC3 for simultaneous deconvolution and coupled clustering based on the joint analysis of different combinations of bulk and single-cell level RNA-seq, ATAC-seq, and HiChIP data

Read more

Summary

Results

We examined data from various cell lines and found that HiChIP loop counts are generally positively correlated with both gene expression values from RNA-seq (Supplementary Fig. 1) and enhancer openness from ATAC-seq (Supplementary Fig. 2). This observation motivated us to use a linear relation between the loop count and the product of gene expression and enhancer openness to couple the three data types, which gives rise to first term of the cost function. When only one data type is available in single cells (input settings 3 and 4) and when the dropout rate is high, we cannot obtain significantly better performance over random deconvolution. Input combinations scRNA-seq, scATAC-seq and scHi-C scRNA-seq, scATAC-seq and bulk Hi-C scRNA-seq, bulk ATAC-seq, bulk Hi-C Bulk RNA-seq, scATAC-seq, bulk Hi-C Random deconvolution

Methods
15 Neurod1
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.