Human language is naturally fuzzy by nature, with words meaning different things to different people, depending on the context. Fuzzy words are words with a subjective meaning, which are typically used in everyday human natural language dialogue and are often ambiguous and vague in meaning and are based on an individual’s perception. Fuzzy Sentence Similarity Measures (FSSM) are algorithms that can compare two or more short texts which contain human-perception-based words and return a numeric measure of similarity of meaning between them. This paper proposes a new FSSM called FUSE (FUzzy Similarity mEasure), to assess an individual’s perception within a FSSM. FUSE is an ontology-based similarity measure that uses Interval Type-2 fuzzy sets to model relationships between categories of human perception-based words. The FUSE algorithm has been developed over four versions and evaluated on several published and newly created datasets. Typically, results have shown that calculating the semantic similarity of two short texts using FUSE, gives a higher correlation with the average human ratings (AHR) compared to traditional sentence similarity measures that do not consider the presence of fuzzy words. This paper focuses on the second version of the FUSE algorithm, referred to as FUSE_2.0 which has been compared to several state-of-the-art, semantic similarity measures (SSM), including the only published FSSM, FAST (Fuzzy Algorithm for Similarity Testing), which has a limited dictionary of fuzzy words and uses Type-1 to model relationships between categories of human perception-based words. Results have shown that FUSE_2.0 achieves a higher correlation with the average human ratings (AHR) compared to traditional SSM’s and FAST. The key contributions of this work can be summarised as follows: The development of a new methodology to model fuzzy words using Interval Type-2 fuzzy sets. This has led to the creation of a fuzzy dictionary for nine fuzzy categories, a useful resource which can be used by other researchers in the field of natural language processing and Computing with Words (CWW) with other fuzzy applications such as semantic clustering.
Read full abstract