An Efficient Binomial Model-Based Measure for Sequence Comparison and its Application

Xiaoqing Liu,Qi Dai,Lihua Li,Zerong He

doi:10.1080/07391102.2011.10508611

Abstract

Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment -based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.

Full Text