BackgroundThe provision of data sharing statements (DSS) for clinical trials has been made mandatory by different stakeholders. DSS are a device to clarify whether there is intention to share individual participant data (IPD). What is missing is a detailed assessment of whether DSS are providing clear and understandable information about the conditions for data sharing of IPD for secondary use.MethodsA random sample of 200 COVID-19 clinical trials with explicit DSS was drawn from the ECRIN clinical research metadata repository. The DSS were assessed and classified, by two experienced experts and one assessor with less experience in data sharing (DS), into different categories (unclear, no sharing, no plans, yes but vague, yes on request, yes with specified storage location, yes but with complex conditions).ResultsBetween the two experts the agreement was moderate to substantial (kappa=0.62, 95% CI [0.55, 0.70]). Agreement considerably decreased when these experts were compared with a third person who was less experienced and trained in data sharing (“assessor”) (kappa=0.33, 95% CI [0.25, 0.41]; 0.35, 95% CI [0.27, 0.43]). Between the two experts and under supervision of an independent moderator, a consensus was achieved for those cases, where both experts had disagreed, and the result was used as “gold standard” for further analysis. At least some degree of willingness of DS (data sharing) was expressed in 63.5% (127/200) cases. Of these cases, around one quarter (31/127) were vague statements of support for data sharing but without useful detail. In around half of the cases (60/127) it was stated that IPD could be obtained by request. Only in in slightly more than 10% of the cases (15/127) it was stated that the IPD would be transferred to a specific data repository. In the remaining cases (21/127), a more complex regime was described or referenced, which could not be allocated to one of the three previous groups. As a result of the consensus meetings, the classification system was updated.ConclusionThe study showed that the current DSS that imply possible data sharing are often not easy to interpret, even by relatively experienced staff. Machine based interpretation, which would be necessary for any practical application, is currently not possible. Machine learning and / or natural language processing techniques might improve machine actionability, but would represent a very substantial investment of research effort. The cheaper and easier option would be for data providers, data requestors, funders and platforms to adopt a clearer, more structured and more standardised approach to specifying, providing and collecting DSS.Trial registrationThe protocol for the study was pre-registered on ZENODO (https://zenodo.org/record/7064624#.Y4DIAHbMJD8).
Read full abstract