Abstract

Accurate splice signal prediction is a cornerstone of gene regulation, biomedical research, and drug discovery. Effective splice boundaries detection requires knowledge of the relationship, dependencies, and characteristics of nucleotides in the surrounding region of splice sites. Although most of the existing computational techniques classify true and false sites, the classification performance purely depends on the extracted structure-based features. The state-of-the-art convolutional neural network (CNN) models achieved excellent performance through automated feature extraction for splice sites, but the degree of model interpretability is relatively weak. To address these challenges, we propose an interpretable CNN framework called InterSSPP for accurate splice site identification. The InterSSPP automatically extracts features during the model training. Our approach also performs model interpretability to excerpt the most relevant patterns that induce splice sites via learned filters. The experimental results show that the InterSSPP outperforms all the state-of-the-art algorithms on benchmark eight acceptor and donor splice site datasets. We analyze the CNN filter ability on pattern detection through high average activation values. The InterSSPP predictor obtained 99.34% and 99.29% average area under the precision-recall curve (AUC-PR) and 99.39% and 99.29% average area under the receiver operating characteristic curve (AUC-ROC) for an acceptor and donor imbalanced splice site datasets using 10-fold cross-validation. We show that our interpretable model can extract patterns through learned filters known to be the most important features for predicting splice junctions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call