Abstract

The so-called Relevance Index (RI) metrics are a set of recently-introduced indicators based on information theory principles that can be used to analyze complex systems by detecting the main interacting structures within them. Such structures can be described as subsets of the variables which describe the system status that are strongly statistically correlated with one another and mostly independent of the rest of the system. The goal of the work described in this paper is to apply the same principles to pattern recognition and check whether the RI metrics can also identify, in a high-dimensional feature space, attribute subsets from which it is possible to build new features which can be effectively used for classification. Preliminary results indicating that this is possible have been obtained using the RI metrics in a supervised way, i.e., by separately applying such metrics to homogeneous datasets comprising data instances which all belong to the same class, and iterating the procedure over all possible classes taken into consideration. In this work, we checked whether this would also be possible in a totally unsupervised way, i.e., by considering all data available at the same time, independently of the class to which they belong, under the hypothesis that the peculiarities of the variable sets that the RI metrics can identify correspond to the peculiarities by which data belonging to a certain class are distinguishable from data belonging to different classes. The results we obtained in experiments made with some publicly available real-world datasets show that, especially when coupled to tree-based classifiers, the performance of an RI metrics-based unsupervised feature extraction method can be comparable to or better than other classical supervised or unsupervised feature selection or extraction methods.

Highlights

  • The Relevance Index (RI) metrics are based on information theory and are usually applied to the study of complex systems, since they are able to detect relevant groups of variables, well integrated among one another and well separated from the others, which provide a functional block description of the complex system they describe [1]

  • We compare the results obtained by ZIFF to other supervised or unsupervised feature extraction and selection methods using real-world data from three test problems

  • A preliminary interesting observation which can provide some hints on the sensibility of the features that ZIFF extracts is that, according to the structure of the data from which they have been extracted, they tend to correspond to contiguous regions, and seem to have the same role as focus-of-attention algorithms have in computer vision

Read more

Summary

Introduction

The Relevance Index (RI) metrics are based on information theory and are usually applied to the study of complex systems, since they are able to detect relevant groups of variables, well integrated among one another and well separated from the others, which provide a functional block description of the complex system they describe [1]. The zI (V ) index of a set V of variables is a standardized version of the integration which measures the relative significance of V within a complex system for which V is a subset of the system state representation. Computation 2019, 7, 39 based on the analysis of a sample of the system states observed over a given time interval, possibly in response to a previous perturbation. The properties measured by the RI metrics are not so different from the characteristics, namely relevance and non-redundancy, which are typical of the most discriminating feature sets describing patterns of interest in classification tasks. The properties that the RI metrics highlight in time series, when analyzing the dynamics of complex systems, can be somehow assimilated to the properties of the multivariate distribution of static patterns subject to noise, distortions, etc

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call