Abstract

Social Media, Web Portals and, in general, information systems offer their own Application Programming Interfaces (APIs), used to provide large data sets concerning every aspect of day-by-day life. APIs usually provide data sets as collections of JSON documents. The heterogeneous structure of JSON documents returned by different APIs constitutes a barrier to effectively query and analyze these data sets. The adoption of NoSQL document stores, such as MongoDB, is useful for gathering these data sets, but does not solve the problem of querying the final heterogeneous repository. The aim of this paper is to provide analysts with a tool, named HammerJDB, that allows for blind querying collections of JSON documents within a NoSQL document database. The idea below is that users may know the application domain but it may be that they are not aware of the real structures of the documents stored in the database—the tool for blind querying tries to bridge the gap, by adopting a query rewriting mechanism. This paper is an evolution of a technique for blind querying Open Data portals and of its implementation within the Hammer framework, presented in some previous work. In this paper, we evolve that approach in order to query a NoSQL document database by evolving the Hammer framework into the HammerJDB framework, which is able to work on MongoDB databases. The effectiveness of the new approach is evaluated on a data set (derived from a real-life one), containing job-vacancy ads collected from European job portals.

Highlights

  • Modern portals for publishing advertisements related to any economical field offer WebServices Application Programming Interfaces (APIs) that allow software systems to publish and retrieve ads

  • We present the HammerJDB framework—it is the first evolution of the Hammer framework towards the application of the blind querying technique to JSON document stores

  • JSON document stores, by exploiting a data set derived from a corpus of real-life job vacancy ads

Read more

Summary

Introduction

Modern portals for publishing advertisements (ads) related to any economical field offer WebServices Application Programming Interfaces (APIs) that allow software systems to publish and retrieve ads. By using the provided API, analysts can gather an impressive amount of data for analysis purposes, for example to study the dynamics of social phenomena (possibly) on an international scale. An interesting application context is offered by the continuously-growing diffusion of Web portals for the labour market (like job portals, employment Web sites and so on); the on-line availability of ads concerning job vacancies enables new research activities for analyzing the labour market and its dynamics [1]. Analysts first of all gather the desired data sets to be analyzed, possibly collecting data for a long time; they carry on with the analysis itself. The data sets returned by Web services are usually provided as collections of JSON documents. JSON (acronym for JavaScript Object Notation) has become a common syntactic framework that is independent of the specific application context

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.