Abstract

BackgroundUsing machine-learning techniques, clinical diagnostic model research extracts diagnostic models from patient data. Traditionally, patient data are often collected using electronic Case Report Form (eCRF) systems, while mathematical software is used for analyzing these data using machine-learning techniques. Due to the lack of integration between eCRF systems and mathematical software, extracting diagnostic models is a complex, error-prone process. Moreover, due to the complexity of this process, it is usually only performed once, after a predetermined number of data points have been collected, without insight into the predictive performance of the resulting models.ObjectiveThe objective of the study of Clinical Data Miner (CDM) software framework is to offer an eCRF system with integrated data preprocessing and machine-learning libraries, improving efficiency of the clinical diagnostic model research workflow, and to enable optimization of patient inclusion numbers through study performance monitoring.MethodsThe CDM software framework was developed using a test-driven development (TDD) approach, to ensure high software quality. Architecturally, CDM’s design is split over a number of modules, to ensure future extendability.ResultsThe TDD approach has enabled us to deliver high software quality. CDM’s eCRF Web interface is in active use by the studies of the International Endometrial Tumor Analysis consortium, with over 4000 enrolled patients, and more studies planned. Additionally, a derived user interface has been used in six separate interrater agreement studies. CDM's integrated data preprocessing and machine-learning libraries simplify some otherwise manual and error-prone steps in the clinical diagnostic model research workflow. Furthermore, CDM's libraries provide study coordinators with a method to monitor a study's predictive performance as patient inclusions increase.ConclusionsTo our knowledge, CDM is the only eCRF system integrating data preprocessing and machine-learning libraries. This integration improves the efficiency of the clinical diagnostic model research workflow. Moreover, by simplifying the generation of learning curves, CDM enables study coordinators to assess more accurately when data collection can be terminated, resulting in better models or lower patient recruitment costs.

Highlights

  • Saving Lives With Early DetectionMany diseases, including cancer, may be cured or managed, if diagnosed sufficiently early

  • To our knowledge, Clinical Data Miner (CDM) is the only electronic Case Report Form (eCRF) system integrating data preprocessing and machine-learning libraries. This integration improves the efficiency of the clinical diagnostic model research workflow

  • Examples of such research are the studies organized by the International Ovarian Tumor Analysis [2-5] and International Endometrial Tumor Analysis (IETA) [6] consortia, which investigate diagnostic models for ovarian and endometrial tumors, respectively

Read more

Summary

Introduction

Saving Lives With Early DetectionMany diseases, including cancer, may be cured or managed, if diagnosed sufficiently early. A report from 2009 estimates that, for the case of cancer in the United Kingdom alone, five to ten thousand deaths could be prevented yearly through early diagnosis [1]. Improving early diagnosis could beneficially affect patient outcomes, but is impeded by several factors, including cost and invasiveness of relevant diagnostic procedures. Using machine-learning techniques, clinical diagnostic model research extracts diagnostic models from patient data. Patient data are often collected using electronic Case Report Form (eCRF) systems, while mathematical software is used for analyzing these data using machine-learning techniques. Due to the lack of integration between eCRF systems and mathematical software, extracting diagnostic models is a complex, error-prone process. Due to the complexity of this process, it is usually only performed once, after a predetermined number of data points have been collected, without insight into the predictive performance of the resulting models

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.