Methods For Splice Site Prediction Research Articles

BackgroundDetection of splice sites plays a key role for predicting the gene structure and thus development of efficient analytical methods for splice site prediction is vital. This paper presents a novel sequence encoding approach based on the adjacent di-nucleotide dependencies in which the donor splice site motifs are encoded into numeric vectors. The encoded vectors are then used as input in Random Forest (RF), Support Vector Machines (SVM) and Artificial Neural Network (ANN), Bagging, Boosting, Logistic regression, kNN and Naïve Bayes classifiers for prediction of donor splice sites.ResultsThe performance of the proposed approach is evaluated on the donor splice site sequence data of Homo sapiens, collected from Homo Sapiens Splice Sites Dataset (HS3D). The results showed that RF outperformed all the considered classifiers. Besides, RF achieved higher prediction accuracy than the existing methods viz., MEM, MDD, WMM, MM1, NNSplice and SpliceView, while compared using an independent test dataset.ConclusionBased on the proposed approach, we have developed an online prediction server (MaLDoSS) to help the biological community in predicting the donor splice sites. The server is made freely available at http://cabgrid.res.in:8080/maldoss. Due to computational feasibility and high prediction accuracy, the proposed approach is believed to help in predicting the eukaryotic gene structure.Electronic supplementary materialThe online version of this article (doi:10.1186/s13040-016-0086-4) contains supplementary material, which is available to authorized users.

Read full abstract

Heterologous introns are often inaccurately or inefficiently processed in higher plants. The precise features that distinguish the process of pre-mRNA splicing in plants from splicing in yeast and mammals are unclear. One contributing factor is the prominent base compositional contrast between U-rich plant introns and flanking G+C-rich exons. Inclusion of this contrast factor in recently developed statistical methods for splice site prediction from sequence inspection significantly improved prediction accuracy. We applied the prediction tools to re-analyze experimental data on splice site selection and splicing efficiency for native and more than 170 mutated plant introns. In almost all cases, the experimentally determined preferred sites correspond to the highest scoring sites predicted by the model. In native genes, about 90% of splice sites are the locally highest scoring sites within the bounds of the flanking exon and intron. We propose that, in most cases, local context (about 50 bases upstream and downstream from a potential intron end) is sufficient to account for intrinsic splice site strength, and that competition for trans-acting factors determines splice site selection in vivo. We suggest that computer-aided splice site prediction can be a powerful tool for experimental design and interpretation.

Read full abstract

Methods For Splice Site Prediction Research Articles

Articles published on Methods For Splice Site Prediction

Evaluating the Accuracy of Splice Site Prediction based on Integrating Jensen-Shannon Divergence and a Polynomial Equation of Order 2

A novel method for splice sites prediction using sequence component and hidden Markov model.

Prediction of donor splice sites using random forest with a new sequence encoding approach.

An Improved Method for Splice Site Prediction in DNA Sequences Using Support Vector Machines

Prediction of splice sites in plant pre-mRNA from sequence properties

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Methods For Splice Site Prediction Research Articles

Articles published on Methods For Splice Site Prediction

Evaluating the Accuracy of Splice Site Prediction based on Integrating Jensen-Shannon Divergence and a Polynomial Equation of Order 2

A novel method for splice sites prediction using sequence component and hidden Markov model.

Prediction of donor splice sites using random forest with a new sequence encoding approach.

An Improved Method for Splice Site Prediction in DNA Sequences Using Support Vector Machines

Prediction of splice sites in plant pre-mRNA from sequence properties