The linguistic summarization and the interpretability, scalability of fuzzy representations of multilevel semantic structures of word-domains

Cat Ho Nguyen,Thi Lan Pham,Tu N Nguyen,Cam Ha Ho,Thu Anh Nguyen

doi:10.1016/j.micpro.2020.103641

Abstract

ABSTRACT The effect of the linguistic (L-) summarization mined from a given dataset D by a human-made method M strongly depends on the fuzzy sets constructed to represent the L-words of dataset attributes. One can observe that the semantics of words is objective (commonly understood the same between human experts,) and word-domains of dataset attributes have their inherent semantic structures. It suggests that to limit the intuitive human influences on such construction, in this study, it requires that the constructed fuzzy set (fs-) representations of the declared word-sets should be the isomorphic images of their words. Such fs-representations of the word-domains are called, in this study, interpretable based on the concept of interpretability in the math-logical theories of A. Tarski et al. It requires the interpretability of the inherent semantic structures of the declared word-sets in their fs-representations structures. With this new feature, the study proposes a data-summarization method that can reveal L-distributions of fuzzy groups of objects represented by a given dataset to the desired dataset L-attribute. The set of all such mined LSs satisfies essential specific human usual L-knowledge, the scalability of its current attributes word-sets, and the current knowledge itself. An experimental study using the Bank Marketing dataset taken from the UCI dataset repository is performed to show the specific advantages of the proposed method.

Full Text