Developing Multilingual Automatic Semantic Annotation Systems

Laura Löfberg,Paul Rayson

doi:10.1017/9781108525695.006

Abstract

We report the development of a multilingual system for the semantic analysis of text. The research on the English Semantic Tagger started in 1990, and after that the system has been ported, first, to Finnish and Russian, and, thereafter, to Arabic, Chinese, Czech, Dutch, French, Italian, Malay, Portuguese, Spanish, Urdu, and Welsh. The development processes of the semantic taggers for English, Finnish, and Russian were relatively similar, involving manual construction of the semantic lexicons, whereas, to speed up the research, new bootstrapping methods including computational approaches have been utilised later in the creation of the semantic lexicons for the other languages. We describe these manual and automatic processes as well as envisaging directions for future development. The resulting multilingual framework of semantic taggers based on equivalent semantic lexicons and one common semantic taxonomy offers a wealth of potential applications which this chapter also illustrates. In addition to developing monolingual applications for these semantic taggers, it is also possible to create cross-lingual and multilingual applications. Furthermore, while the existing semantic analysis systems are designed for the analysis of general language, such systems can also be tailored for a specific purpose to deal more accurately with only one particular domain or task.

Full Text