Abstract

Understanding the specific interactions of transcription factors (TFs) and DNA is essential for comprehending regulatory processes in biological systems. Recently deep learning algorithms have outperformed conventional time-consuming and expensive methods such as ChIP-seq in predicting the sequence specificities of DNA-protein binding. However, because TF binding is a cell-specific behavior, most current deep learning methods build one model for each TF-cell line combination, which leads to problems such as the complexity of maintaining numerous models and the poor prediction performance of some models for cell lines without enough ChIP-seq data. Thus, it is useful to build models with both higher accuracy and wider range of application. We propose a method to build a series of Convolutional Neural Network (CNN) based models grouped by TFs, which are named TF models. Trained with the same database of 554 ChIP-seq data, the proposed TF models outperform DeepBind in the motif discovery task. On one hand, the amount of models has been reduced from 554 to 72, which extend the application scope of each model. On the other hand, TF models achieve higher AUC than Deepbind on 94.2% TF-cell line combinations. Moreover, we demonstrated that TF model achieve an average AUC 0.909 when predict the binding of TFs in cell lines that lack ChIP-seq data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.