Abstract

The presence of long range correlation (LRC) as opposed to Markov chain of a finite order is of primary interest in the study of symbol sequences, such as DNA. Among others, methods using information theory have been developed for Markov chain order estimation. In this work, we consider the Tsallis entropy and define the Tsallis conditional mutual information (TCMI) for Markov chain order estimation, which is more suitable to identify large orders suggesting effectively LRC. The TCMI of order m for a Tsallis parameter q is computed for increasing m and a significance test is performed for each m until TCMI is found non-significant. Randomization and parametric significance tests for TCMI are developed. For the latter, the null distribution of TCMI is approximated with a gamma distribution deriving analytic expressions for its parameters. We assess the accuracy of order estimation with the two tests for TCMI and compare them to the respective tests using the Shannon entropy. Extended simulations on Markov chains of different orders and structures of the transition probability matrix show that Shannon and Tsallis tests with parameter q=2 have similar performance for small orders. When the problem becomes more demanding for higher Markov chain orders and LRCs, Tsallis tests for q=2 have better approximation to the correct order with the sequence length N. Finally, the Tsallis tests are favorably compared to Shannon tests to real DNA sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call