Abstract

Abstract Discretisation often constitutes a part of initial data preparation stage. It translates continuous domain of features into granular, by assigning a number of intervals to represent attributes’ values by nominal categories. Typically all real-valued features are subjected to transformations, regardless of their characteristics. The paper presents research on discretisation executed with a discerning approach. To all available attributes, feature selection mechanisms were employed, in the form of rankings that order variables based on their importance. Exploiting this discovered knowledge on attributes, discretisation was then driven by a ranking, and either highest or lowest ranking features were selected for transformation. The influence of selective discretisation on the performance of classification systems was studied for three popular inducers. The procedure was employed in the field of stylometry, and a task of authorship recognition, considered as a binary classification with balanced classes. The experiments show that discretisation based on importance of features can lead to better performance than in the case of transformations applied to all attributes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call