The challenges in data integration \u2013 heterogeneity and complexity in clinical trials and patient registries of Systemic Lupus Erythematosus

Helen Le Sueur,Nophar Geifman,Ian N Bruce

doi:10.1186/s12874-020-01057-0

Abstract

BackgroundIndividual clinical trials and cohort studies are a useful source of data, often under-utilised once a study has ended. Pooling data from multiple sources could increase sample sizes and allow for further investigation of treatment effects; even if the original trial did not meet its primary goals. Through the MASTERPLANS (MAximizing Sle ThERapeutic PotentiaL by Application of Novel and Stratified approaches) national consortium, focused on Systemic Lupus Erythematosus (SLE), we have gained valuable real-world experiences in aligning, harmonising and combining data from multiple studies and trials, specifically where standards for data capture, representation and documentation, were not used or were unavailable. This was not without challenges arising both from the inherent complexity of the disease and from differences in the way data were captured and represented across different studies.Main bodyData were, unavoidably, aligned by hand, matching up equivalent or similar patient variables across the different studies. Heterogeneity-related issues were tackled and data were cleaned, organised and combined, resulting in a single large dataset ready for analysis. Overcoming these hurdles, often seen in large-scale data harmonization and integration endeavours of legacy datasets, was made possible within a realistic timescale and limited resource by focusing on specific research questions driven by the aims of MASTERPLANS. Here we describe our experiences tackling the complexities in the integration of large, diverse datasets, and the lessons learned.ConclusionsHarmonising data across studies can be complex, and time and resource consuming. The work carried out here highlights the importance of using standards for data capture, recording, and representation, to facilitate both the integration of large datasets and comparison between studies. Where standards are not implemented at the source harmonisation is still possible by taking a flexible approach, with systematic preparation, and a focus on specific research questions.

Highlights

Accomplishing data harmonisation Legacy data can be extremely useful, harmonising and combining large amounts of data from disparate datasets is not always straightforward [3,4,5]
Framing new research questions around groups of patients combined across studies could enable both better understanding of treatment effects, and of patient characteristics that differ between these groups
We propose that prospective alignments of unstandardised data is achievable with limited resource when specific research questions are used to direct which data are to be integrated across studies

Summary

Conclusions

Where standards are not implemented at the source, reality dictates having to make a compromise in setting the approach; we argue that this can be made easier when specific research questions are used to direct which data are to be integrated across studies. Similar integration work would benefit from the input of a data management specialist at the earliest stages in the conception of a project or trial. This would allow for standardisation of the resulting integrated dataset, benefiting future investigations. A flexible approach, enabling the addition of new variables and new datasets, meant that the resulting output could be updated to answer new research questions if required.

Background

Main text

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Research Methodology	Publication Date: Jun 24, 2020
Citations: 12	License type: open-access

R Discovery Prime

R Discovery Prime

The challenges in data integration \u2013 heterogeneity and complexity in clinical trials and patient registries of Systemic Lupus Erythematosus

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Research Methodology

Lead the way for us

Similar Papers

Bookreview of principles of data integration
Martin Telefont
Frontiers in Neuroinformatics | VOL. 7
Martin TelefontMartin Telefont
01 Jan 2013
Frontiers in Neuroinformatics | VOL. 7

Meta-analysis Followed by Replication Identifies Loci in or near CDKN1B, TET3, CD80, DRAM1, and ARID5B as Associated with Systemic Lupus Erythematosus in Asians
Wanling Yang ...
The American Journal of Human Genetics | VOL. 92
Wanling Yang, et. al.Wanling Yang ...
27 Dec 2012
The American Journal of Human Genetics | VOL. 92

Challenges of data integration and interoperability in big data
Anirudh Kadadi ... Rajeev Agrawal
-
Anirudh Kadadi, et. al.Anirudh Kadadi ... Rajeev Agrawal
01 Oct 2014
01 Oct 2014

AB0155 Learning sle pathological mechanisms from multi 'omics profiles
Ss Pfister ... M Hasan
-
Ss Pfister, et. al.Ss Pfister ... M Hasan
01 Jun 2017
AB0155 Learning sle pathological mechanisms from multi 'omics profiles
Ss Pfister ... M Hasan

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The challenges in data integration \u2013 heterogeneity and complexity in clinical trials and patient registries of Systemic Lupus Erythematosus

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Research Methodology