Abstract

Computing the frequent subsets of large multi-attribute data is a key component of local pattern detection data mining algorithms. It is both computation- and data-intensive. The standard parallel algorithms require multiple passes through the data. The cost of data access may easily outweigh any performance gained by parallelizing the computational part. We address two opportunities for performance improvement: using a parallel approximate algorithm that requires only a single pass over the data; and using a probabilistic technique to avoid generating most of the lattice of subsets implied by each object's data. The computation required is only slightly greater than levelwise algorithms, but the amount of data access is much smaller.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.