A Novel Approach for Deciphering Big Data Value Using Dark Data

Surbhi Bhatia,Mohammed Alojail

doi:10.32604/iasc.2022.023501

Abstract

The last decade has seen a rapid increase in big data, which has led to a need for more tools that can help organizations in their data management and decision making. Business intelligence tools have removed many of the obstacles to data visibility, and numerous data mining technologies are playing an essential role in this visibility. However, the increase in big data has also led to an increase in ‘dark data’, data that does not have any predefined structure and is not generated intentionally. In this paper, we show how dark data can be mined for practical purposes and utilized to gain business insight. The most common type of dark data is a log file generated on a web server. Using the example of log files generated by e-commerce transactions, this paper shows how residual data and data trails can prove to be valuable when an actual dataset is inaccessible, and explains the usage of residual data for modeling purposes. The work uses a system identification approach, based on natural language processing for log file tokenization and feature extraction. The features are then embedded into the next step, which uses a deep neural network to identify customers for targeted advertising. The results achieve a significant accuracy and show how dark data has the potential to deliver value for business. Locating, organizing, and understanding dark data can unlock its relevance, usefulness, and potential monetization, but it is important to act when the benefits of use outweigh the costs of access and analysis.

Highlights

The last decade has seen a rapid increase in big data, which has led to a need for more tools that can help organizations in their data management and decision making
To assess the model’s accuracy, the classified contacts from the natural language processing (NLP)-based deep learning model, which analyzes log files to obtain its results, is compared with the results found by the EBE model that is run on the actual dataset
In the analysis based on the Ensemble-based Ensemble (EBE) model, the leads get caught by the algorithm and naturally appear in the JSON log file thrown out by the e-commerce server

Summary

Introduction

The world is interconnected, and data is a critical part of this interconnectedness. Companies rely on existing data to run their business operations efficiently. As explained in the introduction, the rapid increase in data has given rise to an increase in dark data. Cloud management companies have claimed that 52% of the data stored on their servers is dark data [6]. The phenomenon of dark data has become a challenge for data management, as well as an opportunity for businesses. Processes, and stores data in the course of its day-to-day activities, but most companies fail to use this data for other purposes [7,8]. Many businesses are unaware of the value of dark data, and the possibility of monetizing such data [9]. Mining dark data means taking action to obtain useful information from such data and preventing a business from suffering a severe loss [10]. Some examples of dark data are given below

Methods

Results

Conclusion