PCA for heterogeneous data sets in a distributed data mining

E Chandra,P Ajitha

doi:10.1145/1980422.1980451

Abstract

Principal Component Analysis(PCA) is bastion for distributed data analysis. Scalability of the data is limited when centralized data mining is taken into account. Unsupervised classification like clustering and supervised classification like other techniques needs dimensionality reduction as a major part. PCA serves as a base for reducing the dimensionality of data and communication bandwidth. Considering all these issues data mining may be pricier in comparing the distributed aspects. This paper deals with the algorithmic aspects of both Homogeneous and Heterogeneous databases in distributed data mining. Tediousness arose when the heterogeneous data bases are need to be integrated where there is chances of error handling is high. This paper proposes a new algorithm to deal with heterogeneous data and error components is also taken as a part of the algorithm.

Full Text