Abstract

A challenging task in data mining is the process of discovering association rules from a large database. Most of the existing association rule mining algorithms make repeated passes over the entire database to determine the frequent itemsets, which is likely to incur an extremely high I/O overhead. A simple but an effective way to overcome this problem is to sample the database, such that, it produces rules with highest achievable accuracy on the large database. Numerous researchers have proposed sampling approaches for faster and efficient mining of association rules. In this paper, we propose a novel and effective progressive sampling-based approach for mining association rules from a large database. Initially, the frequent patterns are extracted using Apriori algorithm from an initial sample that is selected based on the temporal characteristics and the size of the database. Using the frequent itemsets generated, the negative border of the initial sample is obtained and sorted. Subsequently, the midpoint itemset in the sorted negative border is scanned in the concrete database to check if it is frequent. Based on the support level computed for the midpoint itemset, the sample size is either progressively increased for determining an optimal sample or association rules are mined by considering it as an optimal sample. The experimental results demonstrate the efficiency of the proposed progressive sampling approach in effective mining of association rules.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.