Abstract

We discuss a knowledge discovery (KDD) tool which condenses a large number of related attributes into a few which are a weighted combination of the original attributes. These new attributes, known as principal component in the statistical literature, may be used either to preprocess the data for input to other KDD tools or to identify patterns, using domain knowledge. We have also develop an architecture and data model for the management of heterogeneous distributed databases which facilitates the efficient extraction of distributed data. Efficient processing is a necessary precursor to KDD in such large and heterogeneous databases which are where the full potential of data mining is likely to be realised. We concentrate on providing a fast and efficient DBMS for large, distributed heterogeneous databases. Such a structure is essential if KDD is to become a practicable methodology. However, there is also scope for improving the performance of the statistical algorithm which extracts the principal components, by utilising fast eigenvalue algorithms which have been developed for parallel architectures and parallel hardware.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call