Take the best of big data: Just focus on some of its V's

Ladjel Bellatreche

doi:10.1109/edis.2017.8284019

Abstract

Big data represents a new technology for managing data with high velocity, volume, variety and contributes to creating value for companies. As quoted in1, capturing all queries made on the company website or from customer support calls, emails or chat lines, regardless of their outcome, may have significant value in identifying emerging trends. The Big Data Era has largely contributed in accelerating the development strategic plans issued from governments and research organisms, coving the management, exploitation and analysis of these data by taking into account the different V's of Big Data. Among these plans, we can cite for instance the development of: (a) large-scale platforms (ex. data-clusters, distributed data clusters), (b) Software Defined Environments (SDE) (ex. IBM SDE), (c) advanced programming paradigms (ex. map-reduce, Spark, etc.), Data Analytics Tools (Rapid Miner, Google Fusion Tables, Solver), (d) Visualization tools (Google Chart, Tableau, Oracle Visual Analyzer), and (h) high quality and valuable Knowledge Bases (KB), constructed either by academicians (e.g., Cvc, DBpedia, Freebase, and YAGO) and industrials (e.g., Google Knowledge Graph, Facebook Knowledge Graph, Amazon Knowledge Graph, Credit Rating Agencies, Enterprise Knowledge Base, etc.). In this talk, we would like to foster the creation of a think tank dedicated to getting the best from Big Data V's and the efforts related to it to revisit our research activities without compromising them. In this talk, we would like to share the experience conducted with our Model and Data Engineering Team of the LIAS Laboratory at ISAE-ENSMA, which aims at the design of data warehousing applications. Based on the literature, this design is based on two main approaches: (i) a supply-driven approach (also called data-driven) that starts with an analysis of operational data sources in order to identify all the available data and (ii) a user-driven approach (also known as requirement-driven or goal-orientated) which stems from the determination of the information requirements of different business users. Several studies and experiments show that resorting to these two approaches entails a high risk for companies, since some functional requirements cannot be satisfied. This is due to the lack of relevant data in sources. In parallel, reference studies have identified the crucial role of knowledge bases (KB) for analytical tasks, by offering analysts more entities (people, places, products, etc.). The availability of a huge, high quality valuable KB is an asset for data warehousing designers and decision-makers to construct/exploit a valuable data warehouse. So, faced with this situation, we here present a value-driven approach that revisits the traditional life cycle of the design of data warehouses, by considering KB as an external resource. These different phases are illustrated via the YAGO KB.

Full Text