Rapid and optimized parallel attribute reduction based on neighborhood rough sets and MapReduce

V.K Hanuman Turaga,Srilatha Chebrolu

doi:10.1016/j.eswa.2024.125323

Abstract

Attribute reduction is a crucial step in data pre-processing and feature engineering. It is the selection of a subset of relevant data attributes to reduce the computational complexity of machine learning models and improve their performance. Neighborhood rough set (NRS) theory provides a valuable framework for attribute reduction. It leverages neighborhood information to identify non-redundant and informative attributes for data analysis and machine learning tasks. Attribute subsets based on NRS theory are highly qualitative, producing effective prediction accuracies in Euclidean space. However, existing NRS-based solutions are resource-intensive because of the large search space required for finding neighborhoods and redundant computations. To overcome these limitations, we propose the rapid and optimized attribute reduction (ROAR) algorithm that optimizes the current state-of-the-art attribute-reduction method in NRS theory. The strength of ROAR lies in its ability to accelerate computations by rapidly determining the neighborhood consistency of data samples and consequently expediting the identification of both positive and boundary regions. This efficiency significantly enhances the overall processing time for the data analysis tasks. Experimental results on 12 standard datasets demonstrate that the ROAR algorithm exhibits high efficiency by obtaining accurate reduction results with rapid response times. To ensure that the ROAR algorithm is suitable for high-dimensional datasets, we provide a parallel implementation, namely, the P-ROAR algorithm. The P-ROAR algorithm is the first parallel attribute-reduction algorithm in the classical NRS theory. Computational speeds and scalability metrics establish that P-ROAR is much faster and more scalable for datasets with an enormous attribute space. These algorithms provide a tool for handling feature reduction in data engineering without compromising accuracy and performance.

Full Text