Abstract

Data science and machine learning are buzzwords of the early 21st century. Now pervasive through human civilization, how do these concepts translate to use by researchers and clinicians in the life-science and medical field? Here, we describe a software toolkit, just large enough in scale, so that it can be maintained and extended by a small team, optimised for problems that arise in small/medium laboratories. In particular, this system may be managed from data ingestion statistics preparation predictions by a single person. At the system’s core is a graph type database, so that it is flexible in terms of irregular, constantly changing data types, as such data types are common during explorative research. At the system’s outermost shell, the concept of ’user stories’ is introduced to help the end-user researchers perform various tasks separated by their expertise: these range from simple data input, data curation, statistics, and finally to predictions via machine learning algorithms. We compiled a sizable list of already existing, modular Python platform libraries usable for data analysis that may be used as a reference in the field and may be incorporated into this software. We also provide an insight into basic concepts, such as labelled-unlabelled data, supervised vs. unsupervised learning, regression vs. classification, evaluation by different error metrics, and an advanced concept of cross-validation. Finally, we show some examples from our laboratory using our blood sample and blood clot data from thrombosis patients (sufferers from stroke, heart and peripheral thrombosis disease) and how such tools can help to set up realistic expectations and show caveats.

Highlights

  • Coronary artery disease (CAD), acute ischemic stroke (AIS), peripheral artery disease (PAD) are cardiovascular diseases and represent the leading morbidity and mortality causes globally [1]

  • Does our research yield tangible results such as new scientific hypotheses? Does our research have clinical implications?. This is the model we will follow and around which we build our homemade software, which will be described in more detail

  • We have seen that modern workflow (CRISP-DM standard) with modern software

Read more

Summary

Introduction

Coronary artery disease (CAD), acute ischemic stroke (AIS), peripheral artery disease (PAD) are cardiovascular diseases and represent the leading morbidity and mortality causes globally [1]. The acute tissue damage is mostly due to thrombi occluding the supplying arteries [2]. The lysis susceptibility and stability of these thrombi determines the fate of the patient [3]. Can we predict the diseases from this data? Can we predict it before disease onset? This is especially so with complex, data-driven projects, such as the ‘bench-to-bedside’ projects often seen in the medical and life science field. This problem has been seen before and some solutions, or rather, guidelines were devised. Industrial Micro Machines (IBM) researchers and automotive engineers (Daimler-Chrysler) did face the same problem in the

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.