Abstract

A vast amount of sequence data has been generated due to advancements in DNA sequencing technology. This exponential increase requires new and efficient methods for the analysis of DNA sequence data. Predicting genes in this newly sequenced data is an important and essential step towards genome annotation. Genome annotation helps in determining function of these genes. Accurate splice site prediction in DNA sequences leads to correct gene structure prediction in eukaryotes and it requires effective modelling of regions surrounding these sites. A large number of methods for splice site prediction are available in literature but very few of them are suitable to be incorporated as gene prediction module because of their complexity. In this paper, a splice site prediction method based on second order markov model and support vector machine is developed. This method shows improvement over most of the existing splice site predictors in use today. The experimental results suggest that second order markov model is an effective pre-processing approach. This approach when combined with support vector machine provides better classification accuracy in predicting splice sites.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call