Abstract

An important aspect of using entropy-based models and proposed “synthetic languages”, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous values, this raises the question of how we should determine such symbols. This task of symbolization extends the concept of scalar and vector quantization to consider explicit linguistic properties. Unlike previous quantization algorithms where the aim is primarily data compression and fidelity, the goal in this case is to produce a symbolic output sequence which incorporates some linguistic properties and hence is useful in forming language-based models. Hence, in this paper, we present methods for symbolization which take into account such properties in the form of probabilistic constraints. In particular, we propose new symbolization algorithms which constrain the symbols to have a Zipf–Mandelbrot–Li distribution which approximates the behavior of language elements. We introduce a novel constrained EM algorithm which is shown to effectively learn to produce symbols which approximate a Zipfian distribution. We demonstrate the efficacy of the proposed approaches on some examples using real world data in different tasks, including the translation of animal behavior into a possible human language understandable equivalent.

Highlights

  • Language is the primary way in which humans function intelligently in the world.Without language, it is almost inconceivable that we as a species could survive

  • In contrast to classical value-based models such as those employed in signal processing, or even the concept of quantized models employing discrete values such as those found in classifiers, we propose that the phase of AI systems may be based on the concept of synthetic languages

  • It should be noted that our intention is not to validate symbolization using simulations, rather we present some potential applications which show that useful results can be obtained

Read more

Summary

Introduction

Language is the primary way in which humans function intelligently in the world. Without language, it is almost inconceivable that we as a species could survive. Instead of modeling systems based on hard classifications based on some measured features, a synthetic language approach could be useful for developing an understanding of meaning using behavioral models based on sequences of probabilistic events. These events might be captured as simple language elements. The goal of symbolization can be differentiated from quantization in that the properties of determining language primitives may be very different from efficient data compression or even fidelity of reconstruction These properties may include metrics of robustness, intelligibility, identifiability, and learnability. We demonstrate the efficacy of the proposed approaches on some examples using real world data in quite different tasks including the translation of the movement of a biological agent into a potential human language equivalent

Aspects of Symbolization
Zipf–Mandelbrot–Li Symbolization
Maximum Intelligibility Symbolization
Learning Synthetic Language Symbols
Authorship Classification
Symbol Learning Using an LCEM Algorithm
Potential Translation of Animal Behavior into Human Language
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call