Abstract

Since large amount of geospatial data are produced by various sources and stored in incompatible formats, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. First, we provide a uniform integration paradigm for users to retrieve geospatial data. Then, we align the retrieved geospatial data in the modeling process to eliminate heterogeneity with the help of Karma. Our main contribution focuses on addressing the third problem. Previous work has been done by defining a set of semantic rules for performing the linking process. However, the geospatial data has some specific geospatial relationships, which is significant for linking but cannot be solved by the Semantic Web techniques directly. We take advantage of such unique features about geospatial data to implement the linking process. In addition, the previous work will meet a complicated problem when the geospatial data sources are in different languages. In contrast, our proposed linking algorithms are endowed with translation function, which can save the translating cost among all the geospatial sources with different languages. Finally, the geospatial data is integrated by eliminating data redundancy and combining the complementary properties from the linked records. We mainly adopt four kinds of geospatial data sources, namely, OpenStreetMap(OSM), Wikmapia, USGS and EPA, to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call