A Similarity Measure in Formal Concept Analysis Containing General Semantic Information and Domain Information

Fugang Wang,Wulin Zhang,Nianbin Wang,Shaobin Cai

doi:10.1109/access.2020.2988689

Abstract

Formal concept analysis (FCA) gets into good graces by increasing big data scientists due to its unique advantages. Concept similarity measurement is the key to the FCA-based application. Most of the previous methods are based on set theory and less concerned with semantic information, whereas those methods focusing on semantic information usually rely on ontologies or knowledge bases to obtain the relevant semantic knowledge. However, it is difficult for knowledge methods to obtain domain knowledge in formal contexts (datasets), which are not suited well for domain text data. To tackle these problems, this paper proposes a novel formal concept similarity measure that synthesizes the Semantic information in knowledge bases and Domain information in the formal context (S&D measure). S&D uses word vectors as the representations of words to obtain the semantic information in general knowledge bases while defining novel semantic relations of intent words to obtain the domain information contained in the data itself. It can measure the similarity relation of concepts more comprehensively and precisely, particularly in a domain textual formal context, and it can be implemented automatically and unsupervisedly without any knowledge base, ontology or external corpus. Compared with other related works, experiments show that this method has a better correlation with human judgment.

Highlights

Formal Concept Analysis (FCA) was introduced by R
This paper proposes a novel formal concept similarity measure that synthesizes the Semantic information in knowledge bases and Domain information in the formal context (S&D measure), but it is independent of any knowledge base, ontology, or corpus
We propose the terminology of Inverse Concept Frequency (ICF)

Summary

INTRODUCTION

Formal Concept Analysis (FCA) was introduced by R. Paper [16] introduced rough set theory into a measure model It finds meet irreducible and join irreducible concepts by virtue of superma and infima structures in a lattice, and using these two concepts, calculates Tversky similarities [31] between the extents and the intents of two concepts respectively, obtains the similarity of the two concepts by combining the extent Tversky similarity with the intent Tversky similarity. Paper [22] and [23] observed the influence of semantic relation of words (as attributes) on concept similarity For two concepts, they calculated the similarity of the extents (based on set theory) and that of the intents (based on the semantic relation of words) respectively, and integrated them in some proportion. In regard to the text context, the domain text context, the semantic similarity of the two concepts’ intents, which depends on the number of common words and the semantic relationship between non-common words, should be taken into account

MEMBER IMPORTANCE

SEMANTIC RELATIONS OF INTENTS

EXPERIMENTS AND EVALUATION

CONCLUSION AND FUTURE WORK