Abstract

In clinical and translational research, data science is often and fortuitously integrated with data collection. This contrasts to the typical position of data scientists in other settings, where they are isolated from data collectors. Because of this, effective use of data science techniques to resolve translational questions requires innovation in the organization and management of these data. We propose an operational framework that respects this important difference in how research teams are organized. To maximize the accuracy and speed of the clinical and translational data science enterprise under this framework, we define a set of eight best practices for data management. In our own work at the University of Rochester, we have strived to utilize these practices in a customized version of the open source LabKey platform for integrated data management and collaboration. We have applied this platform to cohorts that longitudinally track multidomain data from over 3000 subjects. We argue that this has made analytical datasets more readily available and lowered the bar to interdisciplinary collaboration, enabling a team-based data science that is unique to the clinical and translational setting.

Highlights

  • In clinical and translational research, data science is often and fortuitously integrated with data collection

  • We propose an idealized workflow for Team Data Science

  • It modifies established workflows that consider data scientists in isolation, by adding connections between steps that respect the active and interdisciplinary nature of clinical and translational research. It addresses what we believe is the ultimate goal of this research: to improve human health by enabling reuse of data and models by the scientific community

Read more

Summary

Introduction

In clinical and translational research, data science is often and fortuitously integrated with data collection. It is often understood to mean the art and science of curating and analyzing data, another reading of the phrase is the use of pre-existing data to conduct science, as opposed to conducting experiments, or deriving theory This latter reading directly implicates translational and clinical science as core domains of data science, as empirical disciplines that must heavily use observational data. Recent studies suggest that the infant gut and nasal microbiomes, potentially interacting with the immune system, directly impact growth and respiratory health [4]. These systems produce complex and high-dimensional data, such as that from sequencing or flow cytometry. Predictive, and perhaps only phenomenological models will need to be developed before the mechanisms that generate the associations between microbiotic state, growth, and respiratory health are fully understood

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.