Abstract

The prediction of transcription factor binding sites (TFBS) plays a crucial role in studying cellular functions and understanding transcriptional regulatory processes. With the development of chromatin immunoprecipitation sequencing (ChIP-seq) technology, an increasing number of computer-aided TFBS prediction models have emerged. However, how to integrate multi-modal information of DNA and obtain efficient features to improve prediction accuracy remains a major challenge. Here, we propose MultiTF, a multi-modal representation learning method based on a cross-attention network for predicting transcription factor binding sites. Among TFBS prediction methods, we are the first to use graph neural networks and cross-attention networks for representation learning. MultiTF uses dna2vec to extract global contextual features of DNA sequences, DNAshapeR to extract shape features, and the CDPfold model and graph attention network for learning and representation of DNA structural features. Finally, with the help of our cross-attention module, we successfully combine sequence, structural, and shape features to achieve interactive fusion. When comparing MultiTF to other state-of-the-art methods using 165 ENCODE ChIP-seq datasets, we find that MultiTF exhibits average ACC, ROC-AUC, and PR-AUC values of 0.911, 0.978, and 0.982, respectively. The results show that MultiTF achieves unprecedented prediction accuracy compared to previous TFBS prediction models. In addition, our visual analysis of structural features provides interpretability for the prediction results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.