IRRELEVANT FEATURE AND RULE REMOVAL FOR STRUCTURAL ASSOCIATIVE CLASSIFICATION

Izwan Nizal Mohd Shaharanee,Jastini Mohd Jamil

doi:10.32890/jict2015.14.0.8158

Abstract

In the classification task, the presence of irrelevant features can significantly degrade the performance of classification algorithms, in terms of additional processing time, more complex models and the likelihood that the models have poor generalization power due to the over fitting problem. Practical applications of association rule mining often suffer from overwhelming number of rules that are generated, many of which are not interesting or not useful for the application in question. Removing rules comprised of irrelevant features can signifi cantly improve the overall performance. In this paper, we explore and compare the use of a feature selection measure to filter out unnecessary and irrelevant features/attributes prior to association rules generation. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data items. Empirical results confirm that by utilizing feature subset selection prior to association rule generation, a large number of rules with irrelevant features can be eliminated. More importantly, the results reveal that removing rules that hold irrelevant features improve the accuracy rate and capability to retain the rule coverage rate of structural associative association.

Highlights

Irrelevant and redundant attributes can contaminate a real word dataset
The results indicate that feature subset selection discards a large number of insignificant attributes/features eliminating a large number of non-significant rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem
This paper has presented an empirical analysis of the usefulness and implication behind using feature subset selection prior to association rule generation with respect to their classification accuracy and coverage rate

Summary

INTRODUCTION

Irrelevant and redundant attributes can contaminate a real word dataset These features can degrade the performance and interfere with any data mining processes typically resulting in reduction on the quality of the discovered rules/ patterns. If a large volume of attributes is present in a dataset, this will slow down the data mining process To overcome these problems, it is important to find the necessary and sufficient subset of features so that the application of association rules mining will be optimal and no irrelevant features will be present within the discovered rules. It is important to find the necessary and sufficient subset of features so that the application of association rules mining will be optimal and no irrelevant features will be present within the discovered rules This would prevent the generation of rules that include any irrelevant and/ or redundant attributes. The results indicate that feature subset selection discards a large number of insignificant attributes/features eliminating a large number of non-significant rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem

LITERATURE REVIEW

Nonflavanoids

Findings

CONCLUSION