Abstract
With the rapid development of the World Wide Web technology, complex and diverse data present explosive growth, so frequent itemset mining plays an essential role. In view of the mining frequent itemsets in multiple data streams by limited computing power of a single processor, an improved algorithm of Parallel Mining Collaborative frequent itemsets in multiple data streams (PMCMD-Stream) was proposed. Firstly, the algorithm compresses the potential and frequent itemsets into CP-Tree only by one-scan and applies increment method to inserting or deleting related branch on CP-Tree, we do not need to repeatedly scanning the databases to generate many candidate frequent itemsets and save the running time. Secondly, this parallelized algorithm can be run in the MapReduce programming environment. Finally, the valuable frequent itemsets, namely global collaborative frequent itemsets, were obtained. Because each candidate frequent itemset is independent, and different candidate frequent itemsets can be processed by multiple computing machines concurrently. The experimental results show that PMCMD-Stream algorithm not only can improve the mining efficiency but also have much better scalability than the existing algorithms, so as to discover the collaborative frequent itemsets from large-scale data streams.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.