Abstract

In this paper, a new approach for centralised and distributed learning from spatial heterogeneous databases is proposed. The centralised algorithm consists of a spatial clustering followed by local regression aimed at learning relationships between driving attributes and the target variable inside each region identified through clustering. For distributed learning, similar regions in multiple databases are first discovered by applying a spatial clustering algorithm independently on all sites, and then identifying corresponding clusters on participating sites. Local regression models are built on identified clusters and transferred among the sites for combining the models responsible for identified regions. Extensive experiments on spatial data sets with missing and irrelevant attributes, and with different levels of noise, resulted in a higher prediction accuracy of both centralised and distributed methods, as compared to using global models. In addition, experiments performed indicate that both methods are computationally more efficient than the global approach, due to the smaller data sets used for learning. Furthermore, the accuracy of the distributed method was comparable to the centralised approach, thus providing a viable alternative to moving all data to a central location.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.