Knowledge extraction from textual data and performance evaluation in an unsupervised context

Yohann Chasseray,Anne-Marie Barthe-Delanoë,Stéphane Négny,Jean-Marc Le Lann

doi:10.1016/j.ins.2023.01.150

Abstract

Among the incoming challenges in monitoring systems, the aggregation, synthesis and management of knowledge through ontological structures hold an essential place. Existing knowledge extraction systems often use a supervised approach that relies on annotated data, inducing implicitly a fastidious annotation process. Current research is towards the definition of unsupervised or semi-supervised systems, allowing a wider range of knowledge extraction. The evaluation of such systems, performing knowledge extraction using natural language processing methods requires performance indicators. The indicators usually used in such evaluations have limitations in the specific context of knowledge extraction for unsupervised ontology population. Thus, the definition of new evaluation methods becomes a need arising from the singularity of the harvested data, especially when these are not annotated. Hence, this article proposes a method for measuring performance in unsupervised context where reference data and extracted data do not overlap optimally. The proposed evaluation method is based on the exploitation of data that serve as a reference but are not specifically linked to the data used for extraction, which makes it an original evaluation method. To apply the performance measure on concrete cases, this paper also presents an unsupervised self-feeding rule-based approach for domain-independent ontology population from textual data.

Full Text