Abstract

The association rule mining can be divided into two steps. The first step is to find out all frequent itemsets, whose occurrences are greater than or equal to the user-specified threshold. The second step is to generate reliable association rules based on all frequent itemsets found in the first step. Identifying all frequent itemsets in a large database dominates the overall performance in the association rule mining. In this paper, we propose an efficient hash-based method, HMFS, for discovering the maximal frequent itemsets. The HMFS method combines the advantages of both the DHP (Direct Hashing and Pruning) and the Pincer-Search algorithms. The combination leads to two advantages. First, the HMFS method, in general, can reduce the number of database scans. Second, the HMFS can filter the infrequent candidate itemsets and can use the filtered itemsets to find the maximal frequent itemsets. These two advantages can reduce the overall computing time of finding the maximal frequent itemsets. In addition, the HMFS method also provides an efficient mechanism to construct the maximal frequent candidate itemsets to reduce the search space. We have implemented the HMFS method along with the DHP and the Pincer-Search algorithms on a Pentium III 800 MHz PC. The experimental results show that the HMFS method has better performance than the DHP and the Pincer-Search algorithms for most of test cases. In particular, our method has significant improvement over the DHP and the Pincer-Search algorithms when the size of a database is large and the length of the longest itemset is relatively long.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.