Abstract

BackgroundDeep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task.ResultsWe assess a transfer learning strategy for TF binding prediction consisting of a pre-training step, wherein we train a multi-task model with multiple TFs, and a fine-tuning step, wherein we initialize single-task models for individual TFs with the weights learned by the multi-task model, after which the single-task models are trained at a lower learning rate. We corroborate that transfer learning improves model performance, especially if in the pre-training step the multi-task model is trained with biologically relevant TFs. We show the effectiveness of transfer learning for TFs with ~ 500 ChIP-seq peak regions. Using model interpretation techniques, we demonstrate that the features learned in the pre-training step are refined in the fine-tuning step to resemble the binding motif of the target TF (i.e., the recipient of transfer learning in the fine-tuning step). Moreover, pre-training with biologically relevant TFs allows single-task models in the fine-tuning step to learn useful features other than the motif of the target TF.ConclusionsOur results confirm that transfer learning is a powerful technique for TF binding prediction.

Highlights

  • Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets

  • We observe that the features learned in the pre-training step are refined in the fine-tuning step to resemble the motif of the target TF, and pre-training with biologically relevant TFs allows the model to learn useful features other than the motif of the target TF in the fine-tuning step, such as the motifs of cofactors

  • A sparse matrix of TF binding events across accessible genomic regions Deep learning-based TF binding prediction can be treated as a binary classification task wherein the ones and zeros represent whether or not a TF binds to a genomic region

Read more

Summary

Introduction

Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an experimental assay that enables the identification of TF-bound regions in vivo at a resolution of a few hundred base pairs (bp) [6]. These regions, known as ChIP-seq peaks, are expected to be enriched for TFBSs. The ReMap database has compiled and uniformly reprocessed thousands of public ChIP-seq datasets [7, 8]. Based on ReMap, the UniBind database stores reliable TFBS predictions from four different computational models, including position weight matrices (PWMs; reviewed in [9]), for the ChIP-seq peaks of 231 human TFs in 315 different human cell and tissue types [10]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call