Formal context reduction in deriving concept hierarchies from corpora using adaptive evolutionary clustering algorithm star

Bryar A Hassan,Tarik A Rashid,Seyedali Mirjalili

doi:10.1007/s40747-021-00422-w

Abstract

It is beneficial to automate the process of deriving concept hierarchies from corpora since a manual construction of concept hierarchies is typically a time-consuming and resource-intensive process. As such, the overall process of learning concept hierarchies from corpora encompasses a set of steps: parsing the text into sentences, splitting the sentences and then tokenising it. After the lemmatisation step, the pairs are extracted using formal context analysis (FCA). However, there might be some uninteresting and erroneous pairs in the formal context. Generating formal context may lead to a time-consuming process, so formal context size reduction is require to remove uninterested and erroneous pairs, taking less time to extract the concept lattice and concept hierarchies accordingly. In this premise, this study aims to propose two frameworks: (1) A framework to review the current process of deriving concept hierarchies from corpus utilising formal concept analysis (FCA); (2) A framework to decrease the formal context’s ambiguity of the first framework using an adaptive version of evolutionary clustering algorithm (ECA*). Experiments are conducted by applying 385 sample corpora from Wikipedia on the two frameworks to examine the reducing size of formal context, which leads to yield concept lattice and concept hierarchy. The resulting lattice of formal context is evaluated to the standard one using concept lattice-invariants. Accordingly, the homomorphic between the two lattices preserves the quality of resulting concept hierarchies by 89% in contrast to the basic ones, and the reduced concept lattice inherits the structural relation of the standard one. The adaptive ECA* is examined against its four counterpart baseline algorithms (Fuzzy K-means, JBOS approach, AddIntent algorithm, and FastAddExtent) to measure the execution time on random datasets with different densities (fill ratios). The results show that adaptive ECA* performs concept lattice faster than other mentioned competitive techniques in different fill ratios.

Highlights

The Semantic Web is an extended web of machine-readable data, which provides a program to process data via machine directly or indirectly [1]
The Semantic Web depends on structured ontologies to organize the underlying data and provide a detailed and portable interpretation of computing machines [2]
Despite reviewing the current framework for concept hierarchy construction from text and concept lattice size reduction, this study focuses on proposing a framework for deriving concept hierarchies taking advantage of deducing formal context size to produce meaningful lattice and concept hierarchy

Summary

Introduction

The Semantic Web is an extended web of machine-readable data, which provides a program to process data via machine directly or indirectly [1]. As an expansion of the latest Web, the Semantic Web can add meaning to the World Wide Web content and support automated services on the basis of semantic representations. The Semantic Web depends on structured ontologies to organize the underlying data and provide a detailed and portable interpretation of computing machines [2]. Ontologies, as an essential part of the Semantic Web, are commonly used in Information Systems. The proliferation of ontologies demands that ontology development should be derived quickly and efficiently to bring about the Semantic Web’s success [1]. Manually constructing ontologies is still a time-consuming and tedious process, with the bottleneck

Objectives

Methods

Results

Conclusion