Abstract

The major sources of abundant data is constantly expanding with the available data collection methodologies in various applications—medical, insurance, scientific, bio-informatics and business. These data sets may be distributed geographically, rich in size and as well as dimensions also. To analyze these data sets to find out the hidden patterns, it is required to download the data to a centralized site which is a challenging task in terms of the limited bandwidth available and computationally also expensive. The covariance matrix is one of the method to estimate the relation between any two dimensions. In this paper we propose a communication efficient algorithm to estimate the covariance matrix in a distributed manner. The global covariance matrix is computed by merging the local covariance matrices using a distributed approach. The results show that it is exactly same as centralized method with good speed-up in terms of computation. The reason for speed-up is because of the parallel construction of local covariances and distributing the cross covariances among the nodes so that the load is balanced. The results are analyzed by considering Mfeat data set on the various partitions which addresses the scalability also.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.