The Importance of Nonlinear Transformations Use in Medical Data Analysis.

Netta Shachar,Yoav Benjamini,Barak Brill,Tal Galili,Mira Marcus-Kalish,Tal Kozlovski,Alexis Mitelpunkt,Tzviel Frostig

doi:10.2196/medinform.7992

Netta Shachar, Yoav Benjamini + Show 6 more

Open Access

https://doi.org/10.2196/medinform.7992

Copy DOI

Abstract

BackgroundThe accumulation of data and its accessibility through easier-to-use platforms will allow data scientists and practitioners who are less sophisticated data analysts to get answers by using big data for many purposes in multiple ways. Data scientists working with medical data are aware of the importance of preprocessing, yet in many cases, the potential benefits of using nonlinear transformations is overlooked.ObjectiveOur aim is to present a semi-automated approach of symmetry-aiming transformations tailored for medical data analysis and its advantages.MethodsWe describe 10 commonly encountered data types used in the medical field and the relevant transformations for each data type. Data from the Alzheimer’s Disease Neuroimaging Initiative study, Parkinson’s disease hospital cohort, and disease-simulating data were used to demonstrate the approach and its benefits.ResultsSymmetry-targeted monotone transformations were applied, and the advantages gained in variance, stability, linearity, and clustering are demonstrated. An open source application implementing the described methods was developed. Both linearity of relationships and increase of stability of variability improved after applying proper nonlinear transformation. Clustering simulated nonsymmetric data gave low agreement to the generating clusters (Rand value=0.681), while capturing the original structure after applying nonlinear transformation to symmetry (Rand value=0.986).ConclusionsThis work presents the use of nonlinear transformations for medical data and the importance of their semi-automated choice. Using the described approach, the data analyst increases the ability to create simpler, more robust and translational models, thereby facilitating the interpretation and implementation of the analysis by medical practitioners. Applying nonlinear transformations as part of the preprocessing is essential to the quality and interpretability of results.

Highlights

Medical Data AnalysisThe volume of data collected these days is constantly growing and is expected to reach 44 zettabytes by 2020 [1], and medical data are rapidly catching on to this trend
Informed use of data collected from the entire population, in a way that can lead to providing better treatment to each patient, is a challenging goal
Unlike the traditional way of collecting data for a specific purpose, big data and its relevant subsets are analyzed for multiple purposes, in multiple ways by means of statistical models, data mining algorithms, machine learning methods, and others

Summary

Introduction

Medical Data AnalysisThe volume of data collected these days is constantly growing and is expected to reach 44 zettabytes by 2020 [1], and medical data are rapidly catching on to this trend. The accessibility of big data through easier-to-use platforms, such as the open source KNIME Analytics Platform or older commercial software such as SPSS Modeler, will allow practitioners who are not expert data analysts to get answers by analyzing big data. The accumulation of data and its accessibility through easier-to-use platforms will allow data scientists and practitioners who are less sophisticated data analysts to get answers by using big data for many purposes in multiple ways. An open source application implementing the described methods was developed Both linearity of relationships and increase of stability of variability improved after applying proper nonlinear transformation. The data analyst increases the ability to create simpler, more robust and translational models, thereby facilitating the interpretation and implementation of the analysis by medical practitioners. Applying nonlinear transformations as part of the preprocessing is essential to the quality and interpretability of results

Methods

Results

Conclusion