Ontology-based Similarity Measures Research Articles

Human language is naturally fuzzy by nature, with words meaning different things to different people, depending on the context. Fuzzy words are words with a subjective meaning, which are typically used in everyday human natural language dialogue and are often ambiguous and vague in meaning and are based on an individual’s perception. Fuzzy Sentence Similarity Measures (FSSM) are algorithms that can compare two or more short texts which contain human-perception-based words and return a numeric measure of similarity of meaning between them. This paper proposes a new FSSM called FUSE (FUzzy Similarity mEasure), to assess an individual’s perception within a FSSM. FUSE is an ontology-based similarity measure that uses Interval Type-2 fuzzy sets to model relationships between categories of human perception-based words. The FUSE algorithm has been developed over four versions and evaluated on several published and newly created datasets. Typically, results have shown that calculating the semantic similarity of two short texts using FUSE, gives a higher correlation with the average human ratings (AHR) compared to traditional sentence similarity measures that do not consider the presence of fuzzy words. This paper focuses on the second version of the FUSE algorithm, referred to as FUSE_2.0 which has been compared to several state-of-the-art, semantic similarity measures (SSM), including the only published FSSM, FAST (Fuzzy Algorithm for Similarity Testing), which has a limited dictionary of fuzzy words and uses Type-1 to model relationships between categories of human perception-based words. Results have shown that FUSE_2.0 achieves a higher correlation with the average human ratings (AHR) compared to traditional SSM’s and FAST. The key contributions of this work can be summarised as follows: The development of a new methodology to model fuzzy words using Interval Type-2 fuzzy sets. This has led to the creation of a fuzzy dictionary for nine fuzzy categories, a useful resource which can be used by other researchers in the field of natural language processing and Computing with Words (CWW) with other fuzzy applications such as semantic clustering.

Read full abstract

BackgroundDetermining similarity between two individual concepts or two sets of concepts extracted from a free text document is important for various aspects of biomedicine, for instance, to find prior clinical reports for a patient that are relevant to the current clinical context. Using simple concept matching techniques, such as lexicon based comparisons, is typically not sufficient to determine an accurate measure of similarity. MethodsIn this study, we tested an enhancement to the standard document vector cosine similarity model in which ontological parent–child (is-a) relationships are exploited. For a given concept, we define a semantic vector consisting of all parent concepts and their corresponding weights as determined by the shortest distance between the concept and parent after accounting for all possible paths. Similarity between the two concepts is then determined by taking the cosine angle between the two corresponding vectors. To test the improvement over the non-semantic document vector cosine similarity model, we measured the similarity between groups of reports arising from similar clinical contexts, including anatomy and imaging procedure. We further applied the similarity metrics within a k-nearest-neighbor (k-NN) algorithm to classify reports based on their anatomical and procedure based groups. 2150 production CT radiology reports (952 abdomen reports and 1128 neuro reports) were used in testing with SNOMED CT, restricted to Body structure, Clinical finding and Procedure branches, as the reference ontology. ResultsThe semantic algorithm preferentially increased the intra-class similarity over the inter-class similarity, with a 0.07 and 0.08 mean increase in the neuro–neuro and abdomen–abdomen pairs versus a 0.04 mean increase in the neuro–abdomen pairs. Using leave-one-out cross-validation in which each document was iteratively used as a test sample while excluding it from the training data, the k-NN based classification accuracy was shown in all cases to be consistently higher with the semantics based measure compared with the non-semantic case. Moreover, the accuracy remained steady even as k value was increased – for the two anatomy related classes accuracy for k=41 was 93.1% with semantics compared to 86.7% without semantics. Similarly, for the eight imaging procedures related classes, accuracy (for k=41) with semantics was 63.8% compared to 60.2% without semantics. At the same k, accuracy improved significantly to 82.8% and 77.4% respectively when procedures were logically grouped together into four classes (such as ignoring contrast information in the imaging procedure description). Similar results were seen at other k-values. ConclusionsThe addition of semantic context into the document vector space model improves the ability of the cosine similarity to differentiate between radiology reports of different anatomical and image procedure-based classes. This effect can be leveraged for document classification tasks, which suggests its potential applicability for biomedical information retrieval.

Read full abstract

Ontology-based Similarity Measures Research Articles

Related Topics

Articles published on Ontology-based Similarity Measures

An Interval Type-2 Fuzzy Ontological Similarity Measure

Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.

The ReCAP Project

Mapping Crisp Structural Semantic Similarity Measures to Fuzzy Context: A Generic Approach

Towards an ontology-supported case-based reasoning approach for computer-aided tolerance specification

Investigating the impact human protein\u2013protein interaction networks have on disease-gene analysis

A novel family of IC-based similarity measures with a detailed experimental survey on WordNet

A new family of information content models with an experimental survey on WordNet

A computational framework for the prioritization of disease-gene candidates

Clustering rule bases using ontology-based similarity measures

Towards the estimation of feature-based semantic similarity using multiple ontologies

An ontology-based similarity measure for biomedical data – Application to radiology reports

Ontology-based semantic similarity: A new feature-based approach

Computing With Words With the Ontological Self-Organizing Map

Ontology based Similarity Measure in Document Ranking

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Ontology-based Similarity Measures Research Articles

Related Topics

Articles published on Ontology-based Similarity Measures

An Interval Type-2 Fuzzy Ontological Similarity Measure

Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.

The ReCAP Project

Mapping Crisp Structural Semantic Similarity Measures to Fuzzy Context: A Generic Approach

Towards an ontology-supported case-based reasoning approach for computer-aided tolerance specification

Investigating the impact human protein\u2013protein interaction networks have on disease-gene analysis

A novel family of IC-based similarity measures with a detailed experimental survey on WordNet

A new family of information content models with an experimental survey on WordNet

A computational framework for the prioritization of disease-gene candidates

Clustering rule bases using ontology-based similarity measures

Towards the estimation of feature-based semantic similarity using multiple ontologies

An ontology-based similarity measure for biomedical data – Application to radiology reports

Ontology-based semantic similarity: A new feature-based approach

Computing With Words With the Ontological Self-Organizing Map

Ontology based Similarity Measure in Document Ranking