A New Algorithm for Extracting Textual Maximal Frequent Itemsets from Arabic Documents

Zeyad Hamid,Hussein K Khafaji

doi:10.1088/1742-6596/1773/1/012012

Zeyad Hamid, Hussein K Khafaji

Open Access

https://doi.org/10.1088/1742-6596/1773/1/012012

Copy DOI

Abstract

In this paper, a new technique has been suggested for extracting textual maximal frequent itemsets named Maximal Itemset Miner Algorithm (MIMA). This algorithm begins search process through generating the best initial border in search space depending on minimum support of items in the first level that achieves the general minimum support determined by the user. Our approach for counting itemsets support combines the idea of vertical representation of the data with a queue data structure to store the itemsets. To reduce search space, the algorithm adopted several pruning conditions for each itemsets in the initial border. Experiments performed on standard textual CNN Arabic dataset and proposed method registers less execution time comparing with the Apriori algorithm when applying it on three different size datasets.

Full Text