Abstract

Data mining technology and association rule mining can be important technologies to deal with a large amount of accumulated data in the medical field, and can reflect the value of large medical data. According to the characteristics of large medical data, aiming at the problem that the traditional Apriori algorithm scans the database too long and generates too many candidate itemsets, a method of digital mapping and sorting of itemsets is proposed. The method of the base model and generation model was used to generate superset, which can improve the efficiency of superset generation and pruning. By using open source framework Hadoop and transplanting the improved algorithm to the Hadoop platform combined with the MapReduce framework, the idea of parallel improvement was introduced based on database partition. Experimental results show that it solves the redundancy of large-scale data sets and makes Apriori algorithm have good parallel scalability. Finally, an example was given to demonstrate the possibility of improving the algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call