Abstract

Distributed Predictive Analytics (DPA) refers to constructing predictive models based on data distributed across nodes. DPA reduces the need for data centralization, thus, alleviating concerns about data privacy, decreasing the load on central servers, and minimizing communication overhead. However, data collected by nodes are inherently different; each node can have different distributions, volumes, access patterns, and features space. This heterogeneity hinders the development of accurate models in a distributed fashion. Many state-of-the-art methods adopt random node selection as a straightforward approach. Such method is particularly ineffective when dealing with data and access pattern heterogeneity, as it increases the likelihood of selecting nodes with low-quality or irrelevant data for DPA. Consequently, it is only after training models over randomly selected nodes that the most suitable ones can be identified based on the predictive performance. This results in more time and resource consumption, and increased network load. In this work, holistic knowledge of nodes’ data characteristics and access patterns is crucial. Such knowledge enables the successful selection of a subset of suitable nodes for each DPA task (query) before model training. Our method engages the most suitable nodes by predicting their relevant distributed data and learning predictive models per query. We introduce a novel DPA query-centric mechanism for node and relevant data selection. We contribute with (i) predictive selection mechanisms based on the availability and relevance of data per DPA query and (ii) various distributed machine learning mechanisms that engage the most suitable nodes for model training. We evaluate the efficiency of our mechanism and provide a comparative assessment with other methods found in the literature. Our experiments showcase that our mechanism significantly outperforms other approaches being applicable in DPA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.