Abstract
As datasets have become a more significant aspect of Open Science, attention has turned to the data transformations that drive their creation. Li and Ludäscher have pointed out the importance of identifying data cleaning workflows as a series of modular transformations that can be extracted for reuse. This modular approach aids reproducibility and allows for transparency in data provenance. However, the constantly evolving nature of data science technology means that even once these modules have been identified and implemented, their functionality must be ported to new platforms as old ones become less applicable or less common in a field of study. When these transformations take place, it is important to consider not only practicality and functionality, but also transparency within a data processing team. Clarity of communication within a team is the first step towards providing clear and transparent documentation to the end user. This case study of an updated workflow process for a long-running longitudinal health and well-being study provides practical examples of these principles.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.