Abstract

In the health industry, the use of data (including Big Data) is of growing importance. The term ‘Big Data’ characterizes data by its volume, and also by its velocity, variety, and veracity. Big Data needs to have effective data governance, which includes measures to manage and control the use of data and to enhance data quality, availability, and integrity. The type and description of data quality can be expressed in terms of the dimensions of data quality. Well-known dimensions are accuracy, completeness, and consistency, amongst others. Since data quality depends on how the data is expected to be used, the most important data quality dimensions depend on the context of use and industry needs. There is a lack of current research focusing on data quality dimensions for Big Data within the health industry; this paper, therefore, investigates the most important data quality dimensions for Big Data within this context. An inner hermeneutic cycle research approach was used to review relevant literature related to data quality for big health datasets in a systematic way and to produce a list of the most important data quality dimensions. Based on a hierarchical framework for organizing data quality dimensions, the highest ranked category of dimensions was determined.

Highlights

  • Big Data refers to the capacity to work with datasets using tools different to those used with traditional relational databases [1]

  • As the discovery of data quality dimensions (DQDs) most important to Big Data within the health industry is of a qualitative nature, potential research methods consist of the use of interviews of data quality managers and the integrative review of existing literature using the inner hermeneutic cycle (IHC)

  • This research was set in a multidisciplinary context involving three main fields: data quality, Big Data, and health informatics

Read more

Summary

Introduction

Big Data refers to the capacity to work with datasets using tools different to those used with traditional relational databases [1]. Big Data is generally characterized by volume, variety, and velocity. Veracity is another characteristic of Big Data which is growing in popularity and concerns the rising issue of certainty or quality involved with the use of data. The healthcare sector is said to be a multi-trillion-dollar company in the making [3]. It is an example of an industry that makes use of a huge amount of data. Big Data is being used to improve decision making in the healthcare industry by increasing the potential of evidence-based medicine’s (EBM’s) “small data” [4]. Personalized decision support systems (PDSS) are enhancing personalized medicine or evidence-based medicine through big data analytics [6]

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call