TempoMAGE: a deep learning framework that exploits the causal dependency between time-series data to predict histone marks in open chromatin regions at time-points with missing ChIP-seq datasets.

Mohammad Hallal,Pierre Khoueiry,Mariette Awad,Tobias Marschall

doi:10.1093/bioinformatics/btab513

Abstract

Identifying histone tail modifications using ChIP-seq is commonly used in time-series experiments in development and disease. These assays, however, cover specific time-points leaving intermediate or early stages with missing information. Although several machine learning methods were developed to predict histone marks, none exploited the dependence that exists in time-series experiments between data generated at specific time-points to extrapolate these findings to time-points where data cannot be generated for lack or scarcity of materials (i.e. early developmental stages). Here, we train a deep learning model named TempoMAGE, to predict the presence or absence of H3K27ac in open chromatin regions by integrating information from sequence, gene expression, chromatin accessibility and the estimated change in H3K27ac state from a reference time-point. We show that adding reference time-point information systematically improves the overall model's performance. In addition, sequence signatures extracted from our method were exclusive to the training dataset indicating that our model learned data-specific features. As an application, TempoMAGE was able to predict the activity of enhancers from pre-validated in-vivo dataset highlighting its ability to be used for functional annotation of putative enhancers. TempoMAGE is freely available through GitHub at https://github.com/pkhoueiry/TempoMAGE. Supplementary data are available at Bioinformatics online.

Full Text