Abstract

Data mining means extracting hidden, previous unknown knowledge and rules with potential value to decision from mass data in database. Association rule mining is a main researching area of data mining area, which is widely used in practice. With the development of network technology and the improvement of level of IT application, distributed database is commonly used. Distributed data mining is mining overall knowledge which is useful for management and decision from database distributed in geography. It has become an important issue in data mining analysis. Distributed data mining can achieve a mining task with computers in different site on the internet. It can not only improve the mining efficiency, reduce the transmitting amount of network data, but is also good for security and privacy of data. Based on related theories and current research situation of data mining and distributed data mining, this thesis will focus on analysis on the structure of distributed mining system and distributed association rule mining algorithm. This thesis first raises a structure of distributed data mining system which is base on multi-agent. It adopts star network topology, and realize distributed saving mass data mining with multi-agent. Based on raised distributed data mining system, this these brings about a new distributed association rule mining algorithm?RK-tree algorithm. RK-tree algorithm is based on the basic theory of twice knowledge combination. Each sub-site point first mines local frequency itemset from local database, then send the mined local frequency itemset to the main site point. The main site point combines those local frequency itemset and get overall candidate frequency itemset, and send the obtained overall candidate frequency itemset to each sub-site point. Each sub-site point count the supporting rate of those overall candidate frequency itemset and sent it back to the main site point. At last, the main site point combines the results sent by sub-site point and gets the overall frequency itemset and overall association rule. This algorithm just needs three times communication between the main and sub-site points, which greatly reduces the amount and times of communication, and improves the efficiency of selection. What's more, each sub-site point can fully use existing good centralized association rule mining algorithm to realize local association rule mining, which can enable them to obtain better local data mining efficiency, as well as reduce the workload. This algorithm is simple and easy to realize. The last part of this thesis is the conclusion of the analysis, as well as the direction of further research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.