Abstract

The Big Data Era has presented many opportunities for using data mining techniques to discover knowledge patterns across large and diverse collections of data where the volume of data is growing at an exponential rate. Recent approaches to Distributed Data Mining (DDM) have focused on addressing the heterogeneous nature of data sources. However, such approaches do not prioritize the reduction of data communication costs which could be prohibitive in large scale sensor networks where bandwidth is a limited resource. In fact, higher communication and computational costs are the two most prominent problems that have been encountered in heterogeneous distributed environments. Moreover, an effort to decrease the communications load in the distributed environment has an adverse influence on the classification accuracy. Therefore, the research challenge lies in maintaining a balance between transmission cost, computational cost, and accuracy. This paper proposes an algorithm Performance Optimizer in Distributed Stream Mining (PODSM) based on Bayesian Inference to reduce the communication volume and resource time in a heterogeneous distributed data mining environment while retaining prediction accuracy. The approach used in this work exploits the past data for calculating statistics and these statistics are then utilized for the new data. In other words, it imparts the ability to learn from experiences. As a result, our experimental evaluation reveals that a significant reduction in the communication load and an improvement in classification response time can be achieved across a diverse range of dataset types. Reduction of 34.66% was obtained with regard to communication overhead for one of the datasets with huge savings of nearly 27% in resource time. Importantly, instead of showing a negative effect on accuracy, this dataset observes an increment of 0.44% in accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.