Blind Queries Applied to JSON Document Stores

Stefania Marrara,Mauro Pelucchi,Giuseppe Psaila

doi:10.3390/info10100291

Abstract

Social Media, Web Portals and, in general, information systems offer their own Application Programming Interfaces (APIs), used to provide large data sets concerning every aspect of day-by-day life. APIs usually provide data sets as collections of JSON documents. The heterogeneous structure of JSON documents returned by different APIs constitutes a barrier to effectively query and analyze these data sets. The adoption of NoSQL document stores, such as MongoDB, is useful for gathering these data sets, but does not solve the problem of querying the final heterogeneous repository. The aim of this paper is to provide analysts with a tool, named HammerJDB, that allows for blind querying collections of JSON documents within a NoSQL document database. The idea below is that users may know the application domain but it may be that they are not aware of the real structures of the documents stored in the database—the tool for blind querying tries to bridge the gap, by adopting a query rewriting mechanism. This paper is an evolution of a technique for blind querying Open Data portals and of its implementation within the Hammer framework, presented in some previous work. In this paper, we evolve that approach in order to query a NoSQL document database by evolving the Hammer framework into the HammerJDB framework, which is able to work on MongoDB databases. The effectiveness of the new approach is evaluated on a data set (derived from a real-life one), containing job-vacancy ads collected from European job portals.

Highlights

Modern portals for publishing advertisements related to any economical field offer WebServices Application Programming Interfaces (APIs) that allow software systems to publish and retrieve ads
We present the HammerJDB framework—it is the first evolution of the Hammer framework towards the application of the blind querying technique to JSON document stores
JSON document stores, by exploiting a data set derived from a corpus of real-life job vacancy ads

Summary

Introduction

Modern portals for publishing advertisements (ads) related to any economical field offer WebServices Application Programming Interfaces (APIs) that allow software systems to publish and retrieve ads. By using the provided API, analysts can gather an impressive amount of data for analysis purposes, for example to study the dynamics of social phenomena (possibly) on an international scale. An interesting application context is offered by the continuously-growing diffusion of Web portals for the labour market (like job portals, employment Web sites and so on); the on-line availability of ads concerning job vacancies enables new research activities for analyzing the labour market and its dynamics [1]. Analysts first of all gather the desired data sets to be analyzed, possibly collecting data for a long time; they carry on with the analysis itself. The data sets returned by Web services are usually provided as collections of JSON documents. JSON (acronym for JavaScript Object Notation) has become a common syntactic framework that is independent of the specific application context

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Sep 21, 2019
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Blind Queries Applied to JSON Document Stores

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

The impact of API evolution on API consumers and how this can be affected by API producers and language designers

-

10 Oct 2019
10 Oct 2019

API Harmony: Graph-based search and selection of APIs in the cloud
E Wittern ... M Vukovic
IBM Journal of Research and Development | VOL. 60
E Wittern, et. al.E Wittern ... M Vukovic
01 Mar 2016
IBM Journal of Research and Development | VOL. 60

Use of ESSENCE APIs to Support Flexible Analysis and Reporting
Aaron Kite-Powell ... Wayne Loschen
Online Journal of Public Health Informatics | VOL. 11
Aaron Kite-Powell, et. al.Aaron Kite-Powell ... Wayne Loschen
30 May 2019
Online Journal of Public Health Informatics | VOL. 11

Does TDWG Need an API Design Guideline?
Ian Engelbrecht ... Hester Steyn
Biodiversity Information Science and Standards | VOL. 5
Ian Engelbrecht, et. al.Ian Engelbrecht ... Hester Steyn
20 Sep 2021
Biodiversity Information Science and Standards | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Blind Queries Applied to JSON Document Stores

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information