Abstract

In order to make public data more useful, it is necessary to provide relevant data sets that meet the needs of users. We introduce the method of linkage between datasets. We provide a method for deriving linkages between fields of structured datasets provided by public data portals. We defined a dataset and connectivity between datasets. The connectivity between them is based on the metadata of the dataset and the linkage between the actual data field names and values. We constructed the standard field names. Based on this standard, we established the relationship between the datasets. This paper covers 31,692 structured datasets (as of May 31, 2020) among the public data portal datasets. We extracted 1,185,846 field names from over 30,000 datasets. We extracted 1,185,846 field names from over 30,000 datasets. As a result of analyzing the field names, the field names related to spatial information were the most common at 35%. This paper verified the method of deriving the relation between data sets, focusing on the field names classified as spatial information. For this reason, we have defined spatial standard field names. To derive similar field names, we extracted related field names into spaces such as locations, coordinates, addresses, and zip codes used in public datasets. The standard field name of spatial information was designed and derived 43% cooperation rate of 31,692 datasets. In the future, we plan to apply similar field names additionally to improve the data set cooperation rate of the spatial information standard.

Highlights

  • IntroductionPublic datasets are created for each institution, and it is difficult to confirm connectivity between field names or data values that have the same meaning and have different open formats

  • Korea was number one after receiving the public data openness index of 0.93 points from the OECD in2019

  • The OECD public data index consists of three areas: data availability, data access, and government support for data utilization, while South Korea remains at the top of both areas[1]

Read more

Summary

Introduction

Public datasets are created for each institution, and it is difficult to confirm connectivity between field names or data values that have the same meaning and have different open formats. In order to solve these problems, we defined a list of data open to public data as a dataset, and based on the metadata of the dataset and the linkage between the actual data field names and values, connectivity between datasets was defined. For this purpose, the public data portal dataset is collected, and the metadata and actual data field names and values are extracted. Derive common metadata values and field names for datasets and standardize them to determine connectivity between datasets

Materials and Methods
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call