Survey: Finite-state technology in natural language processing

Andreas Maletti

doi:10.1016/j.tcs.2016.05.030

Andreas Maletti

Open Access

https://doi.org/10.1016/j.tcs.2016.05.030

Copy DOI

Journal: Theoretical Computer Science	Publication Date: May 25, 2016
Citations: 7	License type: publisher-specific-oa

Affiliation: University of Stuttgart

Abstract

In this survey, we will discuss current uses of finite-state information in several statistical natural language processing tasks. To this end, we will review standard approaches in tokenization, part-of-speech tagging, and parsing, and illustrate the utility of finite-state information and technology in these areas. The particular problems were chosen to allow a natural progression from simple prediction to structured prediction. We aim for a sufficiently formal presentation suitable for readers with a background in automata theory that allows to appreciate the contribution of finite-state approaches, but we will not discuss practical issues outside the core ideas. We provide instructive examples and pointers into the relevant literature for all constructions. We close with an outlook on finite-state technology in statistical machine translation.

Full Text