Abstract

Typically discretisation procedures are implemented as a part of initial pre-processing of data, before knowledge mining is employed. It means that conclusions and observations are based on reduced data, as usually by discretisation some information is discarded. The paper presents a different approach, with taking advantage of discretisation executed after data mining. In the described study firstly decision rules were induced from real-valued features. Secondly, data sets were discretised. Using categories found for attributes, in the third step conditions included in inferred rules were translated into discrete domain. The properties and performance of rule classifiers were tested in the domain of stylometric analysis of texts, where writing styles were defined through quantitative attributes of continuous nature. The performed experiments show that the proposed processing leads to sets of rules with significantly reduced sizes while maintaining quality of predictions, and allows to test many data discretisation methods at the acceptable computational costs.

Highlights

  • Plenty of observed phenomena, objects, and problems are expressed through features that are real-valued

  • Decision rules store information about patterns detected in data by listing conditions for features that lead to class labels

  • Section Background and related works explains motivation leading to research and presents theoretical background, with descriptions of rough set processing applied to data, specifics of characteristic features in stylometric domain, and discretisation approaches

Read more

Summary

Introduction

Objects, and problems are expressed through features that are real-valued. Once all knowledge about described concepts is discovered, we would like it to be represented in such form that discards all unimportant, unnecessary details, and keeps broader general categories, which can be obtained through discretisation To achieve these two aims the paper proposes a new approach, which can be applied to inducers capable of working with both continuous and nominal attributes, and allowing for easy access to structures representing learned knowledge, such as rule classifiers. Decision rules store information about patterns detected in data by listing conditions for features that lead to class labels This transparent form enhances understanding, and is one of the reasons why rule classification systems are often preferred as inducers [6]. Section Background and related works explains motivation leading to research and presents theoretical background, with descriptions of rough set processing applied to data, specifics of characteristic features in stylometric domain, and discretisation approaches. Section Concluding remarks contains the summary and conclusions, and indicates directions for future research

Background and related works
Na þ t 1
Wharton
Summary of the obtained results
Concluding remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.