Abstract

During the recent era of big data, a huge volume of unstructured data are being produced in various forms of audio, video, images, text, and animation. Effective use of these unstructured big data is a laborious and tedious task. Information extraction (IE) systems help to extract useful information from this large variety of unstructured data. Several techniques and methods have been presented for IE from unstructured data. However, numerous studies conducted on IE from a variety of unstructured data are limited to single data types such as text, image, audio, or video. This article reviews the existing IE techniques along with its subtasks, limitations, and challenges for the variety of unstructured data highlighting the impact of unstructured big data on IE techniques. To the best of our knowledge, there is no comprehensive study conducted to investigate the limitations of existing IE techniques for the variety of unstructured big data. The objective of the structured review presented in this article is twofold. First, it presents the overview of IE techniques from a variety of unstructured data such as text, image, audio, and video at one platform. Second, it investigates the limitations of these existing IE techniques due to the heterogeneity, dimensionality, and volume of unstructured big data. The review finds that advanced techniques for IE, particularly for multifaceted unstructured big data sets, are the utmost requirement of the organizations to manage big data and derive strategic information. Further, potential solutions are also presented to improve the unstructured big data IE systems for future research. These solutions will help to increase the efficiency and effectiveness of the data analytics process in terms of context-aware analytics systems, data-driven decision-making, and knowledge management.

Highlights

  • Information extraction (IE) process is used to extract structured content in the form of entities, relations, facts, terms, and other types of information that helps the data analysis pipeline to prepare the data for analysis

  • IE techniques are important to extract useful information from unstructured data that improve the effectiveness of data analytics

  • This review investigated the effectiveness of existing IE techniques for unstructured big data

Read more

Summary

Introduction

Information extraction (IE) process is used to extract structured content in the form of entities, relations, facts, terms, and other types of information that helps the data analysis pipeline to prepare the data for analysis. The efficient and accurate transformation of unstructured data leads to improved performance of data analysis and IE. Various IE approaches have been proposed to extract structured and useful information from unstructured data which eventually helps to manage, process, and analyze unstructured data. IE systems are based on natural language processing (NLP), language modeling, and structure extraction technique. NLP and language modeling have a significant role in the IE process but not included in the scope of this review. This review mainly focused on the extraction techniques with respect to IE subtasks and different data types

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call