Avoiding big data pitfalls.

Pablo Lamata

doi:10.31887/hm.2020.82/plamata

Abstract

Clinical decisions are based on a combination of inductive inference built on experience (ie, statistical models) and on deductions provided by our understanding of the workings of the cardiovascular system (ie, mechanistic models). In a similar way, computers can be used to discover new hidden patterns in the (big) data and to make predictions based on our knowledge of physiology or physics. Surprisingly, unlike humans through history, computers seldom combine inductive and deductive processes. An explosion of expectations surrounds the computer's inductive method, fueled by the "big data" and popular trends. This article reviews the risks and potential pitfalls of this computer approach, where the lack of generality, selection or confounding biases, overfitting, or spurious correlations are among the commonplace flaws. Recommendations to reduce these risks include an examination of data through the lens of causality, the careful choice and description of statistical techniques, and an open research culture with transparency. Finally, the synergy between mechanistic and statistical models (ie, the digital twin) is discussed as a promising pathway toward precision cardiology that mimics the human experience.

Highlights

There is an exciting future for medicine where decisions are informed by precise patientspecific data and risk models
Exploiting the “big data” in health care is one of the main engines working toward this future
We have learned from promising case studies, for example, the importance of access to the right source of evidence to select the right therapy for a pediatric lupus patient,[1] and from epic failures, such as the unmet expectation to predict seasonal flu from Internet searches.[2]

Summary

Introduction

“Big data” is the computer-enhanced version of inductive reasoning There is an exciting future for medicine where decisions are informed by precise patientspecific data and risk models. Exploiting the “big data” in health care is one of the main engines working toward this future.

Objectives

Results

Conclusion