Abstract

Transcription factor binding sites (TFBS) and RNA-binding proteins (RBP) plays a key role in gene regulation, transcription, RNA editing. Identifying and locating these potential sites is essential for detecting pathogenic variation in many biological processes. Some portions of binding sites are recognized by biological experiments that are time-intensive and expensive. Many computational approaches are considered as possible alternative solutions and few deep learning methods are recently developed for fast and accurate prediction of binding sites. Although existing approaches achieve competent performance, many methods requires specialized feature set and moreover interpretability remains challenging. To overcome these problems, we propose an interpretable deep learning technique called protein binding variable pattern predictor (PBVPP), which uses a wide variety of experimental data and performance metrics to predict binding sites. The novelty of our proposed method is based on three key factors: (i) PBVPP along with its variant has the capability to extract vital features from large-scale genomic sequences obtained by high throughput technology to predict the occurrence of TFBS and RBP sites. (ii) The proposed interpretable model reveals how to mine vital features, and also extract variable length patterns for accurate prediction of binding sites. (iii) The obtained motifs are validated against the TFBSshape DNA (JASPAR) database’s known target motifs. The proposed model has shown an improvement of 5.88%, 5.01% over state-of-the-art methods in terms of receiver operating curve for TFBS, RBP and also shown tremendous improvement of 60% in precision recall curve for TFBS prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call