Abstract

Given the widespread use of lossless compression algorithms to approximate algorithmic (Kolmogorov-Chaitin) complexity and that, usually, generic lossless compression algorithms fall short at characterizing features other than statistical ones not different from entropy evaluations, here we explore an alternative and complementary approach. We study formal properties of a Levin-inspired measure m calculated from the output distribution of small Turing machines. We introduce and justify finite approximations mk that have been used in some applications as an alternative to lossless compression algorithms for approximating algorithmic (Kolmogorov-Chaitin) complexity. We provide proofs of the relevant properties of both m and mk and compare them to Levin’s Universal Distribution. We provide error estimations of mk with respect to m. Finally, we present an application to integer sequences from the On-Line Encyclopedia of Integer Sequences, which suggests that our AP-based measures may characterize nonstatistical patterns, and we report interesting correlations with textual, function, and program description lengths of the said sequences.

Highlights

  • Given the widespread use of lossless compression algorithms to approximate algorithmic (Kolmogorov-Chaitin) complexity and that, usually, generic lossless compression algorithms fall short at characterizing features other than statistical ones not different from entropy evaluations, here we explore an alternative and complementary approach

  • In previous papers [4, 5], we have introduced a novel method to approximate K based on the seminal concept of algorithmic probability, introduced by Solomonoff [6] and further formalized by Levin [3] who proposed the concept of uncomputable semimeasures and the so-called Universal Distribution

  • We found that the textual description length as derived from the database is, as illustrated above, best correlated with the AP-based (BDM) measure, with Spearman test statistic 0.193, followed by compression with 0.17, followed by entropy, with 0.09 (Figure 2)

Read more

Summary

Algorithmic Information Measures

Strings produced by programs of length n which could not be produced by programs of length n − 1 are less frequently produced by programs of length n, as only very specific programs can generate them (see Section 14.6 in [8]) This theorem elegantly connects probability to complexity—the frequency (or probability) of occurrence of a string with its algorithmic (Kolmogorov-Chaitin) complexity. A key property of m and K is their universality: the choice of the Turing machine used to compute the distribution is only relevant up to an (additive) constant, independent of the objects.

The Turing Machine Formalism
Turing Machines as a Prefix-Free Set of Programs
A Levin-Style Algorithmic Measure
Computing m5
Algorithmic Complexity of Integer Sequences
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call