Gene regulation is inherently multiscale, but scale-adaptive machine learning methods that fully exploit this property in single-nucleus accessibility data are still lacking. Here, we develop ChromatinHD, a pair of scale-adaptive models that uses the raw accessibility data, without peak-calling or windows, to link regions to gene expression and determine differentially accessible chromatin. We show how ChromatinHD consistently outperforms existing peak and window-based approaches and find that this is due to a large number of uniquely captured, functional accessibility changes within and outside of putative cis-regulatory regions. Furthermore, ChromatinHD can delineate collaborating regulatory regions, including their preferential genomic conformations, that drive gene expression. Finally, our models also use changes in ATAC-seq fragment lengths to identify dense binding of transcription factors, a feature not captured by footprinting methods. Altogether, ChromatinHD, available at https://chromatinhd.org, is a suite of computational tools that enables a data-driven understanding of chromatin accessibility at various scales and how it relates to gene expression.
Read full abstract