Abstract

The analysis of large amounts of data from electronic medical records (EMRs) and daily clinical practice data sources has received increasing attention in the last years. However, few systematic approaches have been proposed to support the extraction of the wealth and diversity of information from these data sources. Specifically, Acute Coronary Syndrome (ACS) data are available in many hospitals and health units because ACS shows elevated morbidity and mortality. This work proposes a method called Data Science Analysis and Representation (DSAR) to scrutinize and exploit, in a univariate way, scientific information content in limited ACS samples. DSAR uses Bootstrap Resampling to provide robust, cross-sectional, and non-parametric statistical tests on categorical and metric variables. It also constructs an informative graphical representation of the database variables, which helps to interpret the results and to identify the relevant variables. Our objectives were to validate DSAR by comparing it to conventional statistical methods when looking for the most relevant variables in the secondary prevention of ACS, and to determine the degree of correlation between them and the Exitus event (associated with patient death). To achieve this objective, we applied DSAR on an anonymized sample of 270 variables from 2377 patients diagnosed with ACS. The results showed that DSAR identified 44% significant variables while conventional methods offered weak correlation results. Then, the scientific literature was reviewed for a set of these variables, validating the agreement with clinical experience and previous ACS research. The conclusion is that DSAR is a valuable and a useful method for clinicians in the identification of potentially predictive variables and, overall, a good starting point for future multivariate secondary analyzes in the clinical field of ACS, or fields with similar information characteristics.

Highlights

  • We have a new scenario with information and communication technologies continually advancing toward better and faster solutions with limitless possibilities, the like of which weThe associate editor coordinating the review of this manuscript and approving it for publication was Kin Fong Lei .have not seen before

  • Hypothesis 1 will be verified by contrasting the Data Science Analysis and Representation (DSAR) results with those ones offered by conventional statistics, while Hypothesis 2 will be verified by comparing the results offered by DSAR with those ones that we have obtained by reviewing the scientific literature with larger samples

  • IDENTIFICATION OF SIGNIFICANT VARIABLES we first analyze if DSAR can obtain the significant variables in relation to the variable Exitus

Read more

Summary

Introduction

We have a new scenario with information and communication technologies continually advancing toward better and faster solutions with limitless possibilities, the like of which weThe associate editor coordinating the review of this manuscript and approving it for publication was Kin Fong Lei .have not seen before. Statistical learning combined with data availability has the potential to change how medical information is processed and treatments are applied [1]. EHealth solutions are not adapted to the daily complexity of clinical practice and they still lack the integration required to truly exploit the use of available clinical data. DATA SCIENCE AND RELATED PRECEDENTS Clinical risk prediction based on data analysis has been recognised as a useful tool for managing disease care and treatments. They constitute a promising technology nowadays to extract relevant information in medical therapeutics. Neural Networks are likewise employed [10], [11] to predict adverse outcomes for ACS, and they demonstrated to be successful in predicting potential ACS patients

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call