Abstract

Social media data become an integral part in the business data and should be integrated into the decisional process for better decision making based on information which reflects better the true situation of business in any field. However, social media data are unstructured and generated in very high frequency which exceeds the capacity of the data warehouse. In this work, we propose to extend the data warehousing process with a staging area which heart is a large scale system implementing an information extraction process using Storm and Hadoop frameworks to better manage their volume and frequency. Concerning structured information extraction, mainly events, we combine a set of techniques from NLP, linguistic rules and machine learning to succeed the task. Finally, we propose the adequate data warehouse conceptual model for events modeling and integration with enterprise data warehouse using an intermediate table called Bridge table. For application and experiments, we focus on drug abuse events extraction from Twitter data and their modeling into the Event Data Warehouse.

Highlights

  • During the last two decades, enterprise’ information systems are invaded with new kinds of data generated by the frequent and ubiquitous use of social media and mobile devices

  • The social media data warehouse is defined with: Social Media Data Warehouse (SMDW):(F, D{}, HDi{}) where: F: the Fact table D: {D1,..., Dn}: set of dimensions defined below, HDi{}: set of hierarchies for each dimension Di defined by HDi={h1,..., hk} Event Fact The event fact represents an event extracted from social media and considered the subject of analysis

  • The bridge table is defined by: BT:(NameBT, A{}, O{}, DateO) where: NameBT: name of the bridge table, A: {a1,..., az}: the set of attributes, in general its attributes are only the primary key which is composed of a set of Foreign keys which are the primary key of the event fact table and the primary keys of the other fact tables of the existing system architecture, O: {o1, ..., on}: is the set of operations that could be operated between EDW and the SMDW, DateO: each operation should have a date for future analysis purposes

Read more

Summary

INTRODUCTION

During the last two decades, enterprise’ information systems are invaded with new kinds of data generated by the frequent and ubiquitous use of social media and mobile devices. Social media became a rich source of business information Their analysis and integration into the data warehouse and the decisional process becomes a real business requirement in order to uncover hidden relationships, new insights and knowledge which will improve decision making and change all the business value chain. The researchers propose to extend the data warehouse architecture with big data technologies namely Hadoop and Storm in order to enable the traditional data warehouse to support social media data volume and velocity. They develop a staging area to extract manageable structured information.

STATE OF THE ART
ETL for Big Data
Conceptual Modeling of Big Data Warehouse
Large Scale Data Warehousing Architecture
STRUCTURED INFORMATION EXTRACTION
Event Data Model
Named Entity Recognition
Events Extraction
Drug Abuse Event
Multidimensional Model
Social Media Data Warehouse Schema
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call