Abstract
Given the widespread use of lossless compression algorithms to approximate algorithmic (Kolmogorov-Chaitin) complexity and that, usually, generic lossless compression algorithms fall short at characterizing features other than statistical ones not different from entropy evaluations, here we explore an alternative and complementary approach. We study formal properties of a Levin-inspired measure m calculated from the output distribution of small Turing machines. We introduce and justify finite approximations mk that have been used in some applications as an alternative to lossless compression algorithms for approximating algorithmic (Kolmogorov-Chaitin) complexity. We provide proofs of the relevant properties of both m and mk and compare them to Levin’s Universal Distribution. We provide error estimations of mk with respect to m. Finally, we present an application to integer sequences from the On-Line Encyclopedia of Integer Sequences, which suggests that our AP-based measures may characterize nonstatistical patterns, and we report interesting correlations with textual, function, and program description lengths of the said sequences.
Highlights
Given the widespread use of lossless compression algorithms to approximate algorithmic (Kolmogorov-Chaitin) complexity and that, usually, generic lossless compression algorithms fall short at characterizing features other than statistical ones not different from entropy evaluations, here we explore an alternative and complementary approach
In previous papers [4, 5], we have introduced a novel method to approximate K based on the seminal concept of algorithmic probability, introduced by Solomonoff [6] and further formalized by Levin [3] who proposed the concept of uncomputable semimeasures and the so-called Universal Distribution
We found that the textual description length as derived from the database is, as illustrated above, best correlated with the AP-based (BDM) measure, with Spearman test statistic 0.193, followed by compression with 0.17, followed by entropy, with 0.09 (Figure 2)
Summary
Strings produced by programs of length n which could not be produced by programs of length n − 1 are less frequently produced by programs of length n, as only very specific programs can generate them (see Section 14.6 in [8]) This theorem elegantly connects probability to complexity—the frequency (or probability) of occurrence of a string with its algorithmic (Kolmogorov-Chaitin) complexity. A key property of m and K is their universality: the choice of the Turing machine used to compute the distribution is only relevant up to an (additive) constant, independent of the objects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have