Abstract

The number of documents published on the Web in languages other than English grows every year. As a consequence, the need to extract useful information from different languages increases, highlighting the importance of research into Open Information Extraction (OIE) techniques. Different OIE methods have dealt with features from a unique language; however, few approaches tackle multilingual aspects. In those approaches, multilingualism is restricted to processing text in different languages, rather than exploring cross-linguistic resources, which results in low precision due to the use of general rules. Multilingual methods have been applied to numerous problems in Natural Language Processing, achieving satisfactory results and demonstrating that knowledge acquisition for a language can be transferred to other languages to improve the quality of the facts extracted. We argue that a multilingual approach can enhance OIE methods as it is ideal to evaluate and compare OIE systems, and therefore can be applied to the collected facts. In this work, we discuss how the transfer knowledge between languages can increase acquisition from multilingual approaches. We provide a roadmap of the Multilingual Open IE area concerning state of the art studies. Additionally, we evaluate the transfer of knowledge to improve the quality of the facts extracted in each language. Moreover, we discuss the importance of a parallel corpus to evaluate and compare multilingual systems.

Highlights

  • Textual data are the main form of data published in the Web, and the number of published documents increases daily

  • We investigated approaches to the study of Multilingual Open Information Extraction

  • We presented a systematic mapping study to analyze the multilingual open information extraction area and performed initial experiments on the use of multilingual resources to improve the performance of Open Information Extraction (IE) systems

Read more

Summary

Introduction

Textual data are the main form of data published in the Web, and the number of published documents increases daily. As much as the Web is a valuable source of information and knowledge, the sheer amount of available pages renders it impossible for a person to explore all of the available information on any subject. It is of great importance to have methods for extracting useful information from texts. Information Extraction (IE), called Text Analysis, studies computational methods for identifying structured semantic information from unstructured sources such as documents or web pages. IE methods usually aim to identify semantic information expressed in natural languages, such as discursive entities and their relations, and store it in a standard, computational-friendly, representation for further usages, such as relational tuples

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call