Abstract

In this paper, we propose a method that is able to derive rules involving range associations from numerical attributes, and to use such rules to build comprehensible classification and characterization (data summary) models. Our approach follows the classification association rule mining paradigm, where rules are generated in a way similar to association rule mining, but search is guided by rule consequents. This allows many credible rules, not just some dominant rules, to be mined from the data to build models. In so doing, we propose several sub-range analysis and rule formation heuristics to deal with numerical attributes. Our experiments show that our method is able to derive range-based rules that offer both accurate classification and comprehensible characterization for numerical data.

Highlights

  • In many practical applications, it is desirable that we are able to extract the following type of rule from numerical data: age ∈ [25, 30] ∧ loan ∈ [2000, 3000] ⇒ repay = yesThat is, we derive rules that contain ranges in their antecedents and a categorical value as a consequent

  • A number of datasets selected from the UCI repository [9] are used in the experiments. These datasets are among the most popular datasets used in the research community for studying classification and they vary in tuple and attribute size, the nature of their numerical attributes and the number of different class labels

  • Our work adopts the classification association rule mining (CARM) methodology [6]. This allows effectively multiple models to be discovered from data, and to be used as a type of ensemble model for classification and characterization

Read more

Summary

Introduction

In the process industry, performance data is often analyzed to help determine how engineering processes may be optimized Such data typically contains a large number of numerical attributes and it is useful that we are able to extract range-based rules to describe the relationships among various variables, so that causality can be understood naturally and processes tuned . This is repeated on the remaining data until all the data is covered this way This strategy works well with categorical data, but is not effective when dealing with numerical attributes, because there is a potentially very large number of ways to form ranges and to cover the data. These methods resort to discretization or point-based split. These mechanisms may not capture some relevant ranges and do not help understand discovered rules [2]

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.