Top-down vertical itemset mining

Mohammad Karim Sohrabi,Vahid Ghods

doi:10.1117/12.2179150

Abstract

Vertical itemset mining is an important frequent pattern mining problem with broad applications. It is challenging since one may need to examine a combinatorial explosive number of possible patterns of items of a dataset in a traditional horizontal algorithm. Since high dimensional datasets typically contain a large number of columns and a small number of rows, vertical itemset mining algorithms, which extract the frequent itemsets of dataset by producing all combination of rows ids, are a good alternative for horizontal algorithms in mining frequent itemsets from high dimensional dataset. Since a rowset can be simply produced from its subsets by adding a new row id to a sub rowset, many bottom up vertical itemset mining algorithms are designed and represented in the literature. However, bottom up vertical mining algorithms suffer from a main drawback. Bottom-up algorithms start the process of generating and testing of rowsets from the small rowsets and go on to the larger rowsets, whereas the small rowsets cannot produce a frequent itemsets because they contain less than minimum support threshold number of rows. In this paper, we described a new efficient vertical top down algorithm called VTD (Vertical Top Down) to conduct mining of frequent itemsets in high dimensional datasets. Our top down approach employed the minimum support threshold to prune the rowsets which any itemset could not be extracted from them. Several experiments on real bioinformatics datasets showed that VTD is orders of magnitude better than previous closed pattern mining algorithms. Our performance study showed that this algorithm outperformed substantially the best former algorithms.

Full Text