Abstract

Most of the KDD (Knowledge Discovery in Databases) algorithms proposed in the literature have been applied to relatively small datasets and do not permit any integration with a DBMS. Hence, the application of these algorithms to the huge amounts of data found in current databases and data warehouses faces serious scalability problems, particularly the problem of excessive learning time. This paper investigates a way of improving the scalability of KDD algorithms, via discretization of ordinal or continuous attributes. This work has two novel aspects. First, we map a generic discretization primitive into an SQL query. Second, we propose a new discretization algorithm for classification tasks. We show how the new discretization algorithm can be implemented with good effect via the SQL primitive.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call