Abstract

We discuss a knowledge discovery (KDD) tool which condenses a large number of related attributes into a few which are a weighted combination of the original attributes. These new attributes, known as principal component in the statistical literature, may be used either to preprocess the data for input to other KDD tools or to identify patterns, using domain knowledge. We have also develop an architecture and data model for the management of heterogeneous distributed databases which facilitates the efficient extraction of distributed data. Efficient processing is a necessary precursor to KDD in such large and heterogeneous databases which are where the full potential of data mining is likely to be realised. We concentrate on providing a fast and efficient DBMS for large, distributed heterogeneous databases. Such a structure is essential if KDD is to become a practicable methodology. However, there is also scope for improving the performance of the statistical algorithm which extracts the principal components, by utilising fast eigenvalue algorithms which have been developed for parallel architectures and parallel hardware.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.