Abstract

BackgroundIndividual clinical trials and cohort studies are a useful source of data, often under-utilised once a study has ended. Pooling data from multiple sources could increase sample sizes and allow for further investigation of treatment effects; even if the original trial did not meet its primary goals. Through the MASTERPLANS (MAximizing Sle ThERapeutic PotentiaL by Application of Novel and Stratified approaches) national consortium, focused on Systemic Lupus Erythematosus (SLE), we have gained valuable real-world experiences in aligning, harmonising and combining data from multiple studies and trials, specifically where standards for data capture, representation and documentation, were not used or were unavailable. This was not without challenges arising both from the inherent complexity of the disease and from differences in the way data were captured and represented across different studies.Main bodyData were, unavoidably, aligned by hand, matching up equivalent or similar patient variables across the different studies. Heterogeneity-related issues were tackled and data were cleaned, organised and combined, resulting in a single large dataset ready for analysis. Overcoming these hurdles, often seen in large-scale data harmonization and integration endeavours of legacy datasets, was made possible within a realistic timescale and limited resource by focusing on specific research questions driven by the aims of MASTERPLANS. Here we describe our experiences tackling the complexities in the integration of large, diverse datasets, and the lessons learned.ConclusionsHarmonising data across studies can be complex, and time and resource consuming. The work carried out here highlights the importance of using standards for data capture, recording, and representation, to facilitate both the integration of large datasets and comparison between studies. Where standards are not implemented at the source harmonisation is still possible by taking a flexible approach, with systematic preparation, and a focus on specific research questions.

Highlights

  • Accomplishing data harmonisation Legacy data can be extremely useful, harmonising and combining large amounts of data from disparate datasets is not always straightforward [3,4,5]

  • Framing new research questions around groups of patients combined across studies could enable both better understanding of treatment effects, and of patient characteristics that differ between these groups

  • We propose that prospective alignments of unstandardised data is achievable with limited resource when specific research questions are used to direct which data are to be integrated across studies

Read more

Summary

Conclusions

Where standards are not implemented at the source, reality dictates having to make a compromise in setting the approach; we argue that this can be made easier when specific research questions are used to direct which data are to be integrated across studies. Similar integration work would benefit from the input of a data management specialist at the earliest stages in the conception of a project or trial. This would allow for standardisation of the resulting integrated dataset, benefiting future investigations. A flexible approach, enabling the addition of new variables and new datasets, meant that the resulting output could be updated to answer new research questions if required.

Background
Main text

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.