Introduction/aims: Healthcare systems data (also known as real-world or routinely collected health data) could transform the conduct of clinical trials. Demonstrating integrity and provenance of these data is critical for clinical trials, to enable their use where appropriate and avoid duplication using scarce trial resources. Building on previous work, this proof-of-concept study used a data intelligence tool, the "Central Metastore," to provide metadata and lineage information of nationally held data. Methods: The feasibility of NHS England's Central Metastore to capture detailed records of the origins, processes, and methods that produce four datasets was assessed. These were England's Hospital Episode Statistics (Admitted Patient Care, Outpatients, Critical Care) and the Civil Registration of Deaths (England and Wales). The process comprised: information gathering; information ingestion using the tool; and auto-generation of lineage diagrams/content to show data integrity. A guidance document to standardise this process was developed. Results/Discussion: The tool can ingest, store and display data provenance in sufficient detail to support trust and transparency in using these datasets for trials. The slowest step was information gathering from multiple sources, so consistency in record-keeping is essential.
Read full abstract