Extraction of ontology schema components from financial news

Mihaela Vela

doi:10.22028/d291-23594

Abstract

In this thesis we describe an incremental multi-layer rule-based methodology for the extraction of ontology schema components from German financial newspaper text. By Extraction of Ontology Schema Components we mean the detection of new concepts and relations between these concepts for ontology building. The process of detecting concepts and relations between these concepts corresponds to the intensional part of an ontology and is often referred to as ontology learning1. We present the process of rule generation for the extraction of ontology schema components as well as the application of the generated rules. Most of the research on ontology learning (Cimiano et al., 2005; Aguado de Cea et al., 2008) investigates the learning potential at sentential level, after the corpus has undergone a deep linguistic analysis2. In this thesis we present a bottomup method for the extraction of ontology schema components, showing that the extraction process of new classes and relations can be initialized at a more ”lower” level using shallow and robust linguistic analysis. We start the investigation by extracting candidates for ontology classes and relations from plain text, by applying text-based and string-based patterns. Then we go one step further and apply the accumulated knowledge from the previous step on Part-of-Speech (PoS) and semantically annotated text, validating in this way 1Ontology learning is the process of semi-automatic support in ontology development (Buitelaar et al., 2005) 2By deep linguistic analysis we mean grammatical function analysis.

Full Text