Abstract

Abstract Objectives In the context of exploratory data analysis and machine learning, standardization of laboratory results is an important pre-processing step. Variable proportions of pathological results in routine datasets lead to changes of the mean (µ) and standard deviation (σ), and thus cause problems in the classical z-score transformation. Therefore, this study investigates whether the zlog transformation compensates these disadvantages and makes the results more meaningful from a medical perspective. Methods The results presented here were obtained with the statistical software environment R, and the underlying data set was obtained from the UC Irvine Machine Learning Repository. We compare the differences of the zlog and z-score transformation for five different dimension reduction methods, hierarchical clustering and four supervised classification methods. Results With the zlog transformation, we obtain better results in this study than with the z-score transformation for dimension reduction, clustering and classification methods. By compensating the disadvantages of the z-score transformation, the zlog transformation allows more meaningful medical conclusions. Conclusions We recommend using the zlog transformation of laboratory results for pre-processing when exploratory data analysis and machine learning techniques are applied.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.