Matrix-Based Method for Inferring Elements in Data Attributes Using a Vector Space Model

Teruaki Hayashi,Yukio Ohsawa

doi:10.3390/info10030107

Abstract

This article addresses the task of inferring elements in the attributes of data. Extracting data related to our interests is a challenging task. Although data on the web can be accessed through free text queries, it is difficult to obtain results that accurately correspond to user intentions because users might not express their objects of interest using exact terms (variables, outlines of data, etc.) found in the data. In other words, users do not always have sufficient knowledge of the data to formulate an effective query. Hence, we propose a method that enables the type, format, and variable elements to be inferred as attributes of data when a natural language summary of the data is provided as a free text query. To evaluate the proposed method, we used the Data Jacket’s datasets whose metadata is written in natural language. The experimental results indicate that our method outperforms those obtained from string matching and word embedding. Applications based on this study can support users who wish to retrieve or acquire new data.

Highlights

The global trends of big data and artificial intelligence (AI) have introduced various types of data that cannot be handled by the existing analytical technologies; attention on areas not centered on AI technologies has increased
We introduced the string matching (TSM) and Doc2vec [19,20] as methods comparable to the proposed approach because a method based on string matching with elements of the attributes can be applied to a situation in which data are retrieved based on description
When the who threshold queries do not alwaysimproved include terms to the data wish to acquire new data cannot discover information regarding what types of data should be obtained for their decision making

Summary

Introduction

The global trends of big data and artificial intelligence (AI) have introduced various types of data that cannot be handled by the existing analytical technologies; attention on areas not centered on AI technologies has increased. Rather than relying on a single data source, methods have been proposed to solve such problems and obtain new values in data through the distribution, exchange, and linking of the data across various fields. A data market has been developed in which various stakeholders exchange data and information about the data across different fields [1,2]. Various stakeholders have discussed the potential benefits of reusing and analyzing massive amounts of data [5,6]. These typically affect data privacy and security [7,8,9,10]

Objectives

Findings

Discussion

Conclusion