Enhancing N-List Structure and Performance for Efficient Large Dataset Analysis

Arkan A Ghaib,Abdullah A Nahi

doi:10.47760/ijcsmc.2024.v13i01.003

Abstract

One of the main challenges in data-intensive sectors like scientific research, data mining, and machine learning is efficiently analyzing enormous datasets. A popular data structure in similarity search algorithms to speed up the retrieval of closest neighbors is the N-List. In this paper, a high-performance method for mining frequent item sets called EN-list is presented. It represents item sets using an N-list and finds frequently recurring item sets directly using an aset-enumeration search tree. Specifically, it drastically reduces the search field by applying the powerful pruning approach known as Children-Parent Equivalency pruning. We conducted extensive experiments to compare En-list against three state-of-the-art algorithms: Fin, PrePost, and DiffNodesets on four distinct real datasets. The experimental results show that EN-list is always the fastest approach across all datasets. Furthermore, EN-list shows good memory consumption performance, requiring less memory than DiffNodesets and PrePost methods and just slightly more than the Fin approach.

Full Text