Anatomy of a Data Science Software Toolkit That Uses Machine Learning to Aid ‘Bench-to-Bedside’ Medical Research—With Essential Concepts of Data Mining and Analysis Explained

László Beinrohr,Péter Piros,Krasimir Kolev,Eszter Kail,Rita Fleiner,Erzsébet Tóth

doi:10.3390/app112412135

Abstract

Data science and machine learning are buzzwords of the early 21st century. Now pervasive through human civilization, how do these concepts translate to use by researchers and clinicians in the life-science and medical field? Here, we describe a software toolkit, just large enough in scale, so that it can be maintained and extended by a small team, optimised for problems that arise in small/medium laboratories. In particular, this system may be managed from data ingestion statistics preparation predictions by a single person. At the system’s core is a graph type database, so that it is flexible in terms of irregular, constantly changing data types, as such data types are common during explorative research. At the system’s outermost shell, the concept of ’user stories’ is introduced to help the end-user researchers perform various tasks separated by their expertise: these range from simple data input, data curation, statistics, and finally to predictions via machine learning algorithms. We compiled a sizable list of already existing, modular Python platform libraries usable for data analysis that may be used as a reference in the field and may be incorporated into this software. We also provide an insight into basic concepts, such as labelled-unlabelled data, supervised vs. unsupervised learning, regression vs. classification, evaluation by different error metrics, and an advanced concept of cross-validation. Finally, we show some examples from our laboratory using our blood sample and blood clot data from thrombosis patients (sufferers from stroke, heart and peripheral thrombosis disease) and how such tools can help to set up realistic expectations and show caveats.

Highlights

Coronary artery disease (CAD), acute ischemic stroke (AIS), peripheral artery disease (PAD) are cardiovascular diseases and represent the leading morbidity and mortality causes globally [1]
Does our research yield tangible results such as new scientific hypotheses? Does our research have clinical implications?. This is the model we will follow and around which we build our homemade software, which will be described in more detail
We have seen that modern workflow (CRISP-DM standard) with modern software

Summary

Introduction

Coronary artery disease (CAD), acute ischemic stroke (AIS), peripheral artery disease (PAD) are cardiovascular diseases and represent the leading morbidity and mortality causes globally [1]. The acute tissue damage is mostly due to thrombi occluding the supplying arteries [2]. The lysis susceptibility and stability of these thrombi determines the fate of the patient [3]. Can we predict the diseases from this data? Can we predict it before disease onset? This is especially so with complex, data-driven projects, such as the ‘bench-to-bedside’ projects often seen in the medical and life science field. This problem has been seen before and some solutions, or rather, guidelines were devised. Industrial Micro Machines (IBM) researchers and automotive engineers (Daimler-Chrysler) did face the same problem in the

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 20, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Anatomy of a Data Science Software Toolkit That Uses Machine Learning to Aid ‘Bench-to-Bedside’ Medical Research—With Essential Concepts of Data Mining and Analysis Explained

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

A Primer on Machine Learning.
Audrene S. Edwards ... Bruce Kaplan
Transplantation | VOL. 105
Audrene S. Edwards, et. al.Audrene S. Edwards ... Bruce Kaplan
18 Aug 2020
Transplantation | VOL. 105

The development of preprints during the COVID-19 pandemic.
Andreas Älgå ... Oskar Eriksson
Journal of internal medicine | VOL. 290
Andreas Älgå, et. al.Andreas Älgå ... Oskar Eriksson
09 Feb 2021
Journal of internal medicine | VOL. 290

Artificial intelligence: Friend or foe?
Anusch Yazdani ... Sam Costa
Australian and New Zealand Journal of Obstetrics and Gynaecology | VOL. 63
Anusch Yazdani, et. al.Anusch Yazdani ... Sam Costa
01 Apr 2023
Australian and New Zealand Journal of Obstetrics and Gynaecology | VOL. 63

Benchmarking machine learning algorithms by inferring transportation modes from unlabeled GPS data
Hekmat Dabbas ... Bernhard Friedrich
Transportation Research Procedia | VOL. 62
Hekmat Dabbas, et. al.Hekmat Dabbas ... Bernhard Friedrich
01 Jan 2021
Transportation Research Procedia | VOL. 62

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Anatomy of a Data Science Software Toolkit That Uses Machine Learning to Aid ‘Bench-to-Bedside’ Medical Research—With Essential Concepts of Data Mining and Analysis Explained

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences