Abstract

Data pre-processing is a major difficulty in the knowledge discovery process, especially feature selection on a large amount of data. In literature, various approaches have been suggested to overcome this difficulty. Unlike most approaches, Rough Set Theory (RST) can discover data de-pendency and reduce the attributes without the need for further information. In RST, the discernibility matrix is the mathematical foundation for computing such reducts. Although it proved its efficiency in feature selection, unfortunately it is computationally expensive on high dimensional data. Algorithm complexity is related to the search of the minimal subset of attributes, which requires computing an exponential number of possible subsets. To overcome this limitation, many RST enhancements have been proposed. Contrary to recent methods, this paper implements RST concepts in an iterated manner using R language. First, the dataset was partitioned into a smaller number of subsets and each subset processed independently to generate its own minimal attribute set. Within the iterations, only minimal elements in the discernibility matrix were considered. Finally, the iterated outputs were compared, and those common among all reducts formed the minimal one (Core attributes). A comparison with another novel proposed algorithm using three benchmark datasets was performed. The proposed approach showed its efficiency in calculating the same minimal attribute sets with less execution time.

Highlights

  • Information system security has been achieved using several security solutions such as Intrusion Detection System (IDS), IPS, anti-viruses and firewalls, etc

  • The RoughSets package in R implements the theory of rough set (RST) and fuzzy rough set (FRST) to model and analyze data

  • We will first explain the motivation for proposing IRS by discussing the computational complexity of the traditional rough set theory when working with high dimensional datasets

Read more

Summary

Introduction

Information system security has been achieved using several security solutions such as IDS, IPS, anti-viruses and firewalls, etc. Based on the work in [16,17,18,19], our study proposes more relevant research, providing a novel algorithm by using rough set package in R language to find the optimal minimal subset of attributes, rather than a smaller one without sacrificing performance. The motivation for proposing this methodology is to overcome the prohibitive complexity of RST concepts when searching for an optimal attribute subset, especially with big data Offering such solutions will enhance the efficiency of real-time analysis of security algorithms, i.e., real-time IPS. Developing a new algorithm using RST basic concepts to create minimal re-ducts; Offering a feasible feature selection methodology scalable to huge datasets, without sacrificing performance; Creating a minimal rule decision database that retains information content; Using three benchmark UCI datasets to evaluate the performance of the methodology; Comparing the result of the proposed model to recent works.

Related Works
Rough Set
R Language
Research Methodology
Problem Statement and Motivation
Datasets
12: End For M
Generating Minimal Decision Rules
Execution Time Comparison with Existing Methods
Findings
Conclusions and Future Works
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call