Big Data Solutions Research Articles

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data.The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation.In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.

Data management research, systems, and technologies have drastically improved the availability of data analysis capabilities, particularly for non-experts, due in part to low-entry barriers and reduced ownership costs (e.g., for data management infrastructures and applications). Major reasons for the widespread success of database systems and today's multi-billion dollar data management market include data independence , separating physical representation and storage from the actual information, and declarative languages , separating the program specification from its intended execution environment. In contrast, today's big data solutions do not offer data independence and declarative specification. As a result, big data technologies are mostly employed in newly-established companies with IT-savvy employees or in large well-established companies with big IT departments. We argue that current big data solutions will continue to fall short of widespread adoption, due to usability problems, despite the fact that in-situ data analytics technologies achieve a good degree of schema independence. In particular, we consider the lack of a declarative specification to be a major road-block, contributing to the scarcity in available data scientists available and limiting the application of big data to the IT-savvy industries. In particular, data scientists currently have to spend a lot of time on tuning their data analysis programs for specific data characteristics and a specific execution environment. We believe that the research community needs to bring the powerful concepts of declarative specification to current data analysis systems, in order to achieve the broad big data technology adoption and effectively deliver the promise that novel big data technologies offer.

Big Data Solutions Research Articles

Related Topics

Articles published on Big Data Solutions

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

Breaking the chains

Big Data in Healthcare - Defining the Digital Persona through User Contexts from the Micro to the Macro. Contribution of the IMIA Organizational and Social Issues WG.

EHR Big Data Deep Phenotyping. Contribution of the IMIA Genomic Medicine Working Group.

Big Data solutions on a small scale: Evaluating accessible high-performance computing for social research

New Design Principles for Effective Knowledge Discovery from Big Data

Obtain confidentiality or/and authenticity in Big Data by ID-based generalized signcryption

A Dashboard of an Education Data Portal using Big Data Solutions

Environmental Conditions’ Big Data Management and Cloud Computing Analytics for Sustainable Agriculture

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

Study of CDR Real-Time Query Based on Big Data Technologies

Core System Transformation and Big Data Re-Architecting

Global Trends in Medical Journal Publishing

Detection of the onset of agitation in patients with dementia: real-time monitoring and the application of big-data solutions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Big Data Solutions Research Articles

Related Topics

Articles published on Big Data Solutions

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

Breaking the chains

Big Data in Healthcare - Defining the Digital Persona through User Contexts from the Micro to the Macro. Contribution of the IMIA Organizational and Social Issues WG.

EHR Big Data Deep Phenotyping. Contribution of the IMIA Genomic Medicine Working Group.

Big Data solutions on a small scale: Evaluating accessible high-performance computing for social research

New Design Principles for Effective Knowledge Discovery from Big Data

Obtain confidentiality or/and authenticity in Big Data by ID-based generalized signcryption

A Dashboard of an Education Data Portal using Big Data Solutions

Environmental Conditions’ Big Data Management and Cloud Computing Analytics for Sustainable Agriculture

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

Study of CDR Real-Time Query Based on Big Data Technologies

Core System Transformation and Big Data Re-Architecting

Global Trends in Medical Journal Publishing

Detection of the onset of agitation in patients with dementia: real-time monitoring and the application of big-data solutions