Nucleosomes represent elementary building units of eukaryotic chromosomes and consist of DNA wrapped around a histone octamer flanked by linker DNA segments. Nucleosomes are central in epigenetic pathways and their genomic positioning is associated with regulation of gene expression, DNA replication, DNA methylation and DNA repair, among other functions. Building on prior discoveries that DNA sequences noticeably affect nucleosome positioning, our objective is to identify nucleosome positions and related features across entire genome. Here, we introduce an interpretable framework based on the concepts of deep residual networks (NuPoSe). Trained on high-coverage human experimental MNase-seq data, NuPoSe is able to learn sequence and structural patterns associated with nucleosome organization in human genome. NuPoSe can bealsoapplied to unseen data from different organisms and cell types. Our findings point to 43 informative features, most of them constitute tri-nucleotides, di-nucleotides and one tetra-nucleotide. Most features are significantly associated with the nucleosomal structuralcharacteristics, namely, periodicity of nucleosomal DNA and its location with respect to a histone octamer. Importantly, we show that features derived from the 27 bp linker DNAflanking nucleosomescontribute up to 10% to the quality of the prediction model. This,alongwith the comprehensive training sets, deep-learning architecture, and feature selection method,may contribute totheNuPoSe's 80-89% classificationaccuracy on different independentdatasets.
Read full abstract