Abstract

Despite the great demand for KDD (knowledge discovery in databases) in large database and data warehouse systems, in general KDD algorithms have been applied to relatively small data samples and do not have any integration at all with relational DBMS. The application of KDD algorithms to large databases faces serious scalability problems, particularly concerning unacceptably long processing times. This paper proposes a framework for data-parallel KDD, aiming mainly at improving the efficiency and scalability of KDD algorithms. The approach is based on generic, context-free, set-oriented primitives. The primitives are generic in the sense that they capture the core operations underlying a number of KDD algorithms. This is important because no single algorithm can be expected to perform well across all domains. Moreover, the primitives are set-oriented, i.e. they perform operations on data elements independently of the order of those elements. This allows us to efficiently exploit data parallelism on cost-effective parallel database servers through SQL database queries. (4 pages)

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.