With increasing growth of DNA sequence data, it has become an urgent demand to develop new methods to accurately predict the genes. The performance of gene detection methods mainly depend on the efficiency of splice site prediction methods. In this paper, a novel method for detecting splice sites is proposed by using a new effective DNA encoding method and AdaBoost.M1 classifier. Our proposed DNA encoding method is based on multi-scale component (MSC) and first order Markov model (MM1). It has been applied to the HS3D dataset with repeated 10 fold cross validation. The experimental results indicate that the new method has increased the classification accuracy and outperformed some current methods such as MM1-SVM, Reduced MM1-SVM, SVM-B, LVMM, DM-SVM, DM2-AdaBoost and MS C+Pos(+APR)-SVM.
Read full abstract