Abstract

Due to the rapid growth of resource sharing, distributed systems are developed, which can be used to utilize the computations. Data mining (DM) provides powerful techniques for finding meaningful and useful information from a very large amount of data, and has a wide range of real‐world applications. However, traditional DM algorithms assume that the data is centrally collected, memory‐resident, and static. It is challenging to manage the large‐scale data and process them with very limited resources. For example, large amounts of data are quickly produced and stored at multiple locations. It becomes increasingly expensive to centralize them in a single place. Moreover, traditional DM algorithms generally have some problems and challenges, such as memory limits, low processing ability, and inadequate hard disk, and so on. To solve the above problems, DM on distributed computing environment [also called distributed data mining (DDM)] has been emerging as a valuable alternative in many applications. In this study, a survey of state‐of‐the‐art DDM techniques is provided, including distributed frequent itemset mining, distributed frequent sequence mining, distributed frequent graph mining, distributed clustering, and privacy preserving of distributed data mining. We finally summarize the opportunities of data mining tasks in distributed environment. WIREs Data Mining Knowl Discov 2017, 7:e1216. doi: 10.1002/widm.1216This article is categorized under: Application Areas > Business and Industry Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining Technologies > Computer Architectures for Data Mining

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.