Abstract

A similarity measure is a measure evaluating the degree of similarity between two fuzzy data sets and has become an essential tool in many applications including data mining, pattern recognition, and clustering. In this paper, we propose a similarity measure capable of handling non-overlapped data as well as overlapped data and analyze its characteristics on data distributions. We first design the similarity measure based on a distance measure and apply it to overlapped data distributions. From the calculations for example data distributions, we find that, though the similarity calculation is effective, the designed similarity measure cannot distinguish two non-overlapped data distributions, thus resulting in the same value for both data sets. To obtain discriminative similarity values for non-overlapped data, we consider two approaches. The first one is to use a conventional similarity measure after preprocessing non-overlapped data. The second one is to take into account neighbor data information in designing the similarity measure, where we consider the relation to specific data and residual data information. Two artificial patterns of non-overlapped data are analyzed in an illustrative example. The calculation results demonstrate that the proposed similarity measures can discriminate non-overlapped data.

Highlights

  • Among the research on the fuzzy set theory, the analysis of data uncertainties has been carried out by numerous researchers through fuzzy data sets [6,7,8]. Of those researches based on fuzzy data sets, the similarity measure design problem—i.e., the design of a measure evaluating the degree of similarity between two data sets [8,9,10,11,12,13,14,15]—has been attracting an increasing amount of attention from the research community due to the increasing number of applications, including data mining, pattern recognition, and clustering [16,17]

  • Calculation results have shown that they are effective for overlapped data

  • We have concluded that we cannot directly apply those similarity measures designed for overlapped data to non-overlapped data

Read more

Summary

Introduction

Among the research on the fuzzy set theory, the analysis of data uncertainties has been carried out by numerous researchers through fuzzy data sets [6,7,8] Of those researches based on fuzzy data sets, the similarity measure design problem—i.e., the design of a measure evaluating the degree of similarity between two data sets [8,9,10,11,12,13,14,15]—has been attracting an increasing amount of attention from the research community due to the increasing number of applications, including data mining, pattern recognition, and clustering [16,17]. If we design a similarity measure based on a distance measure, we can use a general fuzzy membership function without any limit on its shape [13,14,15]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call