Abstract

The discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring item sets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. The authors show that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transactions to be considered. They may also be able to make the sampled database resident in main-memory. Furthermore, they show that sampling can accurately represent the data patterns in the database with high confidence. They experimentally evaluate the effectiveness of sampling on different databases, and study the relationship between the performance, accuracy, and confidence of the chosen sample.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.