Abstract

The Eclat algorithm is one of the most widely used frequent itemset mining methods. In the normal Eclat algorithm and its variants, it is inefficient to calculate the intersection size of itemsets by sequentially comparing elements, especially for large scale transactions. In this paper, we propose the fast Eclat algorithms that can quickly calculate the intersection size of multiple itemsets by using minwise hashing and the estimators. Minwise hashing is used to calculate the Jaccard similarity coefficient by mapping the elements of the sets to those of smaller sets. Two estimators are used to estimate the intersection size of itemsets based on the Jaccard similarity coefficient. Due to the “imperfect” hash function, minwise hashing may obtain a biased Jaccard similarity, which results in error between the real value and the estimated value of the intersection size. Thus, we proposed the HashEclat which uses the maximum of ${|}{A}{|}$ and ${|}{B}{|}$ to represent the union size ${|}{A}{\cup }{B}{|}$ , and proposed the Sim-Eclat which uses the minimum of ${|A|}$ and ${|}{B}{|}$ to represent the intersection size ${|}{A}{\cap }{B}{|}$ . Furthermore, we use a boundary error ${E}$ for better performance as follows: if ${E}$ is large, the intersection size is determined by a traditional method, and the result is more accurate but takes longer to compute; otherwise, it will be the opposite. Both the theoretical analysis and experimental results show that the proposed algorithms can obtain almost all frequent itemsets with higher speed and less memory usage than other algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.