Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that is applicable to compare two probability density functions. Data comparison is widely used field in our society nowadays, and it is a very import part. To compare two objects is a common task that people from all walks of life would do. People always want or need to find the similarity between two different objects or the difference between two similar objects. Some different data may share some similarity in some given attribute(s). To compare with two datasets based on attributes by classification algorithms, for the attributes, we need to select them out by rules and the system is known as rule-based reasoning system or expert system which classifies a given test instance into a particular outcome from the learned rules. The test instance carries multiple attributes, which are usually the values of diagnostic tests. In this article, we are proposing a classifier ensemble-based method for comparison of two datasets or one dataset with different features. The ensemble data mining learning methods are applied for rule generation, and a multi-criterion evaluation approach is used for selecting reliable rules over the results of the ensemble methods. The efficacy of the proposed methodology is illustrated via an example of two disease datasets; it is a combined dataset with the same instances and normal attributes but the class in strictly speaking. This article introduces a fuzzy rule-based classification method called FURIA, to get the relationship between two datasets by FURIA rules. And find the similarity between these two datasets.
Read full abstract