Abstract
Measuring similarity is of a great interest in many research areas such as in data sciences, machine learning, pattern recognition, text analysis and information retrieval to name a few. Literature has shown that possibility is an attractive notion in the context of distinguishability assessment and can lead to very efficient and computationally inexpensive learning schemes. This paper focuses on determining the similarity between two possibility distributions. A review of existing similarity measures within the possibilistic framework is presented first. Then, similarity measures are analyzed with respect to their capacity to satisfy a set of required properties that a similarity measure should own. Most of the existing possibilistic similarity measures produce undesirable outcomes since they generally depend on the application context. A new similarity measure, called InfoSpecificity, is introduced and the similarity measures are categorized into three main methods: morphic-based, amorphic-based and hybrid. Two experiments are being conducted using four benchmark databases. The aim of the experiments is to compare the efficiency of the possibilistic similarity measures when applied to real data. Empirical experiments have shown good results for the hybrid methods, particularly with the InfoSpecificity measure. In general, the hybrid methods outperform the other two categories when evaluated on small-size samples, i.e., poor-data context (or poor-informed environment) where possibility theory can be used at the greatest benefit.
Highlights
Determining similarities is part of a fundamental process of a human sense-making mechanism that consists of three elements: an object or event, a mental model, and an association between them [1]
The notion of similarity has been exploited in various fields of Computer Sciences [2]–[5] such as in machine learning pattern recognition [4], classification [6], image processing [7] and decision making [5]
We propose to group the possibilistic similarity measures into three categories: those based on the evaluation of the morphic aspect, those based on the magnitude as amorphic-based ones, and the hybrid category that combines morphic and amorphic criteria to assess similarity
Summary
Determining similarities is part of a fundamental process of a human sense-making mechanism that consists of three elements: an object or event, a mental model, and an association between them [1]. The notion of similarity has been exploited in various fields of Computer Sciences [2]–[5] such as in machine learning pattern recognition [4], classification [6], image processing [7] and decision making [5]. Similarity in a machine learning context is required to compute the ‘‘closeness’’ between elements in a dataset. It allows to understand the structure within the input data [8]. Refining the estimation of similarity scores leads to the improvement of algorithms accuracy as well as the minimization of errors and confusions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.