Abstract

Registries of clinical studies such as ClinicalTrials.gov are an important source of information. However, the process of manually entering metadata is prone to errors which impedes their use and thereby the overall usefulness of the registry. In this work, we propose a generic approach towards detection of errors in the metadata by using the Shapes Constraint Language for defining rule templates covering constraints regarding value type and cardinality. We developed a Python 3 algorithm for the automatic validation of 15 rule instances applied to the whole ClinicalTrials.gov database (355,862 studies; 27th October 2020) resulting in more than 5 million metadata verifications. Our results show a large number of errors in different metadata fields, such as i) missing values, ii) values not coming from a predefined set or iii) wrong cardinalities, can be detected using this approach. Since 2015 approximately 5% of all studies contain one or more errors. In the future, we will apply this technique to other registries and develop more complex rules by focusing on the semantics of the metadata. This could render the possibility of automatically correcting entries, increasing the value of registries of clinical studies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.