Abstract

Sequence alignment algorithms, which are hardly to be efficient, are frequently used in protein sequences analysis. In order to improve the analyzing efficiency, an improved PST(Probabilistic Suffix Trees) model is proposed in this paper. Firstly, by analyzing the similarity between protein sequences analysis and sequences data mining, the idea of using PST model to analyze protein sequences is presented; And then the standard PST model is improved by smoothing operation according to the features of protein sequences analysis; Next, taking the smoothed PST as features of data set, the similarities degree between protein sequences are calculated by using the similarities of the normalized sequences; At last, the effectiveness and high efficiency of the algorithm are verified by some protein sequences analysis examples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call