Importance of Characteristic Features and Their Form for Data Exploration.

Urszula Stańczyk,Beata Zielosko,Grzegorz Baron

doi:10.3390/e26050404

Urszula Stańczyk, Beata Zielosko + Show 1 more

Open Access

https://doi.org/10.3390/e26050404

Copy DOI

Journal: Entropy	Publication Date: May 6, 2024
License type: CC BY 4.0

Affiliation: University of Technology, University of Silesia

Abstract

The nature of the input features is one of the key factors indicating what kind of tools, methods, or approaches can be used in a knowledge discovery process. Depending on the characteristics of the available attributes, some techniques could lead to unsatisfactory performance or even may not proceed at all without additional preprocessing steps. The types of variables and their domains affect performance. Any changes to their form can influence it as well, or even enable some learners. On the other hand, the relevance of features for a task constitutes another element with a noticeable impact on data exploration. The importance of attributes can be estimated through the application of mechanisms belonging to the feature selection and reduction area, such as rankings. In the described research framework, the data form was conditioned on relevance by the proposed procedure of gradual discretisation controlled by a ranking of attributes. Supervised and unsupervised discretisation methods were employed to the datasets from the stylometric domain and the task of binary authorship attribution. For the selected classifiers, extensive tests were performed and they indicated many cases of enhanced prediction for partially discretised datasets.

Full Text