Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England.

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.

Similar Papers
  • Discussion
  • Cite Count Icon 30
  • 10.1016/s2589-7500(21)00078-9
The challenges and opportunities of mental health data sharing in the UK
  • May 24, 2021
  • The Lancet Digital Health
  • Tamsin Ford + 22 more

The challenges and opportunities of mental health data sharing in the UK

  • Book Chapter
  • 10.5772/intechopen.1013231
Perspective Chapter: A Regulatory Perspective on Data Quality Requirements for Medical Product Development and Testing
  • Jan 19, 2026
  • Rebecca E Ghosh + 2 more

This chapter provides a medical products regulatory perspective on data quality and integrity requirements, with a specific focus on Real-World Data (RWD) and synthetic data regulatory requirements. The chapter will cover clinical study data, routine or RWD sources, as well as synthetic data requirements when used for medical product development. The chapter starts by briefly defining data quality and integrity for regulatory purposes, as well as outlining regulatory requirements and referencing specific regulatory frameworks and guidance documents. Key dimensions of data quality are discussed (e.g., complete, contemporaneous, accurate, consistent, and attributable), and examples are included to illustrate how data quality has impacted regulatory decision-making. An overview of the use of RWD for regulatory purposes is provided, covering RWD use for clinical trials, post-marketing safety surveillance, federated analyses, and medical device development. Case studies of the use of RWD in a regulatory setting are provided, and the key RWD data quality requirements from a regulatory perspective are highlighted. The chapter then provides an overview of synthetic data, an emerging area of interest to address RWD limitations. It covers synthetic data generation approaches and applications within medical product development, as well as the different quality considerations with synthetic data for regulatory applications.

  • Research Article
  • Cite Count Icon 12
  • 10.1200/cci.21.00013
Comparing Findings From a Friends of Cancer Research Exploratory Analysis of Real-World End Points With the Cancer Analysis System in England
  • Dec 1, 2021
  • JCO Clinical Cancer Informatics
  • Pia Horvat + 9 more

PURPOSEThis study compared real-world end points extracted from the Cancer Analysis System (CAS), a national cancer registry with linkage to national mortality and other health care databases in England, with those from diverse US oncology data sources, including electronic health care records, insurance claims, unstructured medical charts, or a combination, that participated in the Friends of Cancer Research Real-World Evidence Pilot Project 1.0. Consistency between data sets and between real-world overall survival (rwOS) was assessed in patients with immunotherapy-treated advanced non–small-cell lung cancer (aNSCLC).PATIENTS AND METHODSPatients with aNSCLC, diagnosed between January 2013 and December 2017, who initiated treatment with approved programmed death ligand-1 (PD-[L]1) inhibitors until March 2018 were included. Real-world end points, including rwOS and real-world time to treatment discontinuation (rwTTD), were assessed using Kaplan-Meier analysis. A synthetic data set, Simulacrum, on the basis of conditional random sampling of the CAS data was used to develop and refine analysis scripts while protecting patient privacy.RESULTSCharacteristics (age, sex, and histology) of the 2,035 patients with immunotherapy-treated aNSCLC included in the CAS study were broadly comparable with US data sets. In CAS, a higher proportion (46.7%) of patients received a PD-(L)1 inhibitor in the first line than in US data sets (18%-30%). Median rwOS (11.4 months; 95% CI, 10.4 to 12.7) and rwTTD (4.9 months; 95% CI, 4.7 to 5.1) were within the range of US-based data sets (rwOS, 8.6-13.5 months; rwTTD, 3.2-7.0 months).CONCLUSIONThe CAS findings were consistent with those from US-based oncology data sets. Such consistency is important for regulatory decision making. Differences observed between data sets may be explained by variation in health care settings, such as the timing of PD-(L)1 approval and reimbursement, and data capture.

  • Research Article
  • Cite Count Icon 30
  • 10.1177/2168479018764662
The Use of Real-World Evidence and Data in Clinical Research and Postapproval Safety Studies.
  • Nov 1, 2018
  • Therapeutic Innovation & Regulatory Science
  • Mary Jo Lamberti + 5 more

The adoption and use of real-world evidence (RWE) is becoming increasingly important to drug development and patient safety. The Tufts Center for the Study of Drug Development (CSDD) conducted a benchmark survey of pharmaceutical and biotechnology companies and contract research organizations in a number of areas that support real-world data (RWD) and evidence, including operations and performance areas. Data were gathered on organizational functions, staff, roles and responsibilities, and skill sets required. Also, current and future allocation of budgets and spending were examined as well as return on investment measures. A total of 30 unique companies responded to the survey. Nearly all respondents (29/30 companies) reported that their organizations had an RWE function and most companies indicated that their RWE functions were increasing in size (21 companies). From a postapproval regulatory and labeling perspective, there were two primary areas for company use of RWD to generate evidence: one for postapproval safety studies, including decreasing the severity of a label warning or to support risk evaluation and mitigation strategies (REMS) (12/22 companies; 55%), which allows for real-world patient population data to inform safety decisions; and the other for postmarketing studies (13/23 companies; 57%). Developing greater insight into therapeutic area needs, gaining market access, and greater understanding of drug effectiveness were the top measures identified for return on investment for use of RWE. Expanding the use of RWE in regulatory decision making and increasing uses of real-world data by sponsors will fill the gaps that are critically needed for drug development and safety.

  • Front Matter
  • Cite Count Icon 24
  • 10.1016/j.jtho.2021.11.002
Lung Cancer in the United Kingdom
  • Jan 21, 2022
  • Journal of Thoracic Oncology
  • Neal Navani + 8 more

Lung Cancer in the United Kingdom

  • Front Matter
  • Cite Count Icon 4
  • 10.1016/j.bjps.2008.11.036
Pinnaplasty - A dwindling art in today's modern NHS
  • Jan 22, 2009
  • Journal of Plastic, Reconstructive & Aesthetic Surgery
  • Zeeshan Ahmad

Pinnaplasty - A dwindling art in today's modern NHS

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1136/bmjopen-2023-077297
Cross-sectional analysis of use of real-world data in single technology appraisals of oncological medicine by the National Institute for Health and Care Excellence in 2011–2021
  • Mar 1, 2024
  • BMJ Open
  • Jiyeon Kang + 1 more

ObjectivesThis study aims to identify how real-world data (RWD) have been used in single technology appraisals (STAs) of cancer drugs by the National Institute for Health and Care Excellence (NICE).DesignCross-sectional...

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.jcpo.2024.100507
Analysis of factors associated with use of real-world data in single technology appraisals of cancer drugs by the National Institute for Health and Care Excellence
  • Sep 26, 2024
  • Journal of Cancer Policy
  • Jiyeon Kang + 1 more

ObjectivesThis study investigates factors associated with use of real-world data (RWD) in economic modelling for single technology appraisals (STAs) of cancer drugs by the National Institute for Health and Care Excellence (NICE) to improve systematic understanding of the use of RWD. MethodsThe data were extracted from STAs of cancer drugs, for which NICE issued guidance between January 2011 and December 2022 (n=267). Binary regression was used to test hypotheses concerning the greater or lesser use of RWD. Bonferroni-Holm correction was used to control error rates in multiple hypotheses tests. Several explanatory variables were considered in this analysis, including time (Time), incidence rate of disease (IR), availability of direct treatment comparison (AD), generalisability of trial data (GE), maturity of survival data in trial (MS) and previous technology recommendations by NICE (PR). The primary outcome variable was any use of RWD. Secondary outcome variables were specific uses of RWD in economic models. ResultsAD had a statistical negative association with any use of RWD whereas no associations with non-parametric and parametric use of RWD were found. Time had several statistical associations with use of RWD (validating survival distributions for the intervention, estimating progression-free survival for the intervention, estimating overall survival for comparators and transition probabilities). ConclusionsRWD were more likely to be used in economic modelling of cancer drugs when randomised controlled trials failed to provide relevant clinical information of the drug for appraisals, particularly in the absence of direct treatment comparisons. These results, based on analysis of data systematically collected from previous appraisals, suggest that uses of RWD were associated with data gaps in the economic modelling. While this result may support some of the claimed advantages of using RWD when evidence is absent, the question, the extent to which use of RWD in indirect treatment comparisons reduces uncertainty is still to be determined.

  • Research Article
  • Cite Count Icon 1
  • 10.1158/1538-7445.sabcs21-p5-19-02
Abstract P5-19-02: Methodological approaches to the use of real-world data(RWD) for medical products to treat breast cancer: An FDA oncology center of excellence evaluation of RWD submissions
  • Feb 15, 2022
  • Cancer Research
  • Melanie E Royce + 6 more

Background: Aligning with 21st Century Cures legislation, FDA is exploring various methodologies to advance appropriate uses of Real-World Data (RWD) to generate Real-World Evidence (RWE). Inclusion of RWD to support regulatory decision making has increased in oncology, and this review specifically focused on characterizing RWD submissions for the treatment of breast cancer (BC). Methods: A systematic search was conducted using internal FDA databases to identify RWD submissions from 2010 to 2020. Search terms included real world evidence, real world data, cancer registry, administrative claims, external control arm, and other terms relevant to RWD/RWE. Relevant regulatory submissions were reviewed, pre-defined common data elements were extracted, and the subset applicable to breast cancer was evaluated. Results: Of 142 regulatory submissions that included RWD, 6 specifically evaluated BC indications and 3 were for solid tumor indications with potential applicability to BC, corresponding to 4 new molecular entities. Regulatory objectives included support for labeling changes including efficacy (expanded indications), safety , and dose or administration modifications. The most commonly used design was a retrospective observational study with structured electronic health records (EHRs) or medical claims data, supplemented by unstructured data from medical records or chart review for missing data elements. Four of the 6 BC submissions were significantly limited by a high degree of data missingness and confounding, with some studies including key covariates that were missing in >50% of the structured data. RWD was used to provide contextual evidence for label expansion for populations not included or adequately represented in the registration trial. Of note, for the application expanding the label to include treatment of male BC, the regulatory decision was primarily based on clinical trial data. The primary rwEndpoints submitted were overall survival (rwOS), progression free survival (rwPFS), response rate (rwORR) and time to next treatment (TTNT). Safety outcomes were investigated in all but 1 of the studies, most commonly as a secondary RWD endpoint. Conclusion: In our review of regulatory submissions relevant to breast cancer therapies, RWD has largely been used to contextualize and complement prospective clinical trial data. Evaluating that selected RWD is fit for purpose to address the regulatory objective(s) and all analytical plans are prespecified allows for robust data characterization, and appropriate evaluation. Data relevance (availability of key variables) along with reliability assessment which includes evaluating data for completeness, consistency, and trends over time are necessary for the rigorous evaluation of RWE in drug development. Data missingness is a key issue in RWD, especially when structured data are not available and specific variables are unlikely to be captured in a reliable way in the unstructured data or further validation is not feasible. To optimize RWD as evidence for specific patient populations, attention to the proportion of patients excluded is necessary to avoid concerns regarding the generalizability of the data. Careful selection of rwEndpoints must be aligned with the study design and objective, include data such as prior, concomitant and subsequent anti-cancer treatments, and the ability for outcome validation to be methodologically appropriate. When contemplating a regulatory submission using RWD, early consultation with the appropriate FDA review division can provide additional feedback on the appropriate use of RWD or pragmatic designs. Citation Format: Melanie E Royce, Jennifer J. Lee, Christy L. Osgood, Laleh Amiri-Kordestani, Julia A. Beaver, Paul G. Kluetz, Donna R. Rivera. Methodological approaches to the use of real-world data(RWD) for medical products to treat breast cancer: An FDA oncology center of excellence evaluation of RWD submissions [abstract]. In: Proceedings of the 2021 San Antonio Breast Cancer Symposium; 2021 Dec 7-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2022;82(4 Suppl):Abstract nr P5-19-02.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.jval.2017.08.017
Using Real-World Data (RWD) in Health Technology Assessment (HTA) Practice: A Comparative Study of 5 HTA Agencies
  • Oct 1, 2017
  • Value in Health
  • A Makady + 8 more

Using Real-World Data (RWD) in Health Technology Assessment (HTA) Practice: A Comparative Study of 5 HTA Agencies

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 36
  • 10.1002/pds.4697
Considerations in characterizing real-world data relevance and quality for regulatory purposes: A commentary.
  • Dec 5, 2018
  • Pharmacoepidemiology and Drug Safety
  • Cynthia J Girman + 3 more

The 21st Century Cures Act of 2016 provided a framework to the US Food and Drug Administration (FDA) to rapidly move treatments to patients.1 The increased acceptability of real‐world data (RWD) sources allows for innovative ways to study products and has the potential to reduce trial costs. Published papers provide guidance regarding data quality issues, reproducibility, and validity assessment.2 Rapid evolvement of electronic health records (EHRs) encourages greater consideration of their use in research.1, 2, 3, 4, 5, 6 For years, the FDA has relied on epidemiological studies of postapproval product safety using RWD5, 6 (eg, administrative claims and EHR) and for device effectiveness studies4; however, regulatory use for evaluating drug effectiveness has been rare. As part of the Prescription Drug User Fee Act (PDUFA VI),3 use of RWD is being considered for potential contributions to evaluating effectiveness and safety of new indications for approved products and to satisfy postapproval study requirements. Recently, the Duke Margolis Center for Health Policy held workshops and issued two paper on this topic.5, 6 The first paper focused on defining RWD as data routinely collected pertinent to patient health status and/or delivery of care, and the use of RWD in regulatory and clinical contexts.5 The second white paper from the October 1, 2018, workshop focused on data relevancy and quality, including cleaning, transforming, and linking RWD to characterize RWD sources as “fit for regulatory purpose.”6 These papers offer a practical “commonsense” high‐level view of primary data and methods considerations for RWD use from a regulatory perspective, facilitating discussion around regulatory uses of RWD within the research community and industry. However, salient points are missing from the papers and the RWD discussions among FDA, researchers, and industry. Here, we provide a commentary on the data considerations discussed in the white papers and highlight pertinent considerations with respect to RWD in the context of whether data are relevant, representative, and robust. 1.1. Data relevance The recent white paper defines data relevance dimensions including representativeness of the population of interest, critical data field availability, accurate linking at the patient level with multiple data sources, and adequate sample size and follow‐up time to demonstrate expected treatment effects.6 Guidance from FDA on how to ensure RWD are fit for purpose and adequate to support regulatory decisions would be helpful on each dimension. Determining if RWD is fit for regulatory purpose is a “contextual exercise” where the specific research question, regulatory use, and data characteristics drive what meaningful conclusions can be drawn.6 Covariates may be critical for one research question but not another. Exposures and outcomes should be well defined when part of the research question but may not be critical for natural history studies. There is no “one‐size‐fits‐all” approach, and critical data components should be evaluated for each research question and regulatory use.7 A framework is needed to guide choice and evaluation of critical data elements for specific research questions for regulatory use. Representativeness of the population of interest is gauged in many ways. Recent FDA guidance on Patient Focused Drug Development suggests a statistical sampling approach be used to obtain patient experience data representative of the target population.8 However, most US real‐world databases use administrative claims or EHR for patients seeking medical attention. These RWD sources should be considered broadly representative of the population eligible for using most, if not all, new products and services. “Representativeness” should be assessed broadly in the context of likely product users with some diversity in geography, health status, and health care system as appropriate for the specific research question and regulatory context. While data linkage is likely to limit the eligible sample, it may be needed to increase the informative nature of RWD, especially with increasing evaluations to support precision medicine. Sample size should be derived based on anticipated treatment effects for studies of treatment effectiveness or safety, whether comparative or not, to ensure appropriate precision of estimates. For rare diseases, there should be flexibility given data sparseness worldwide, as indicated in the FDA guidance on rare disease.8 Additional guidance would be useful regarding how “accurate linking” should be assessed since linking 100% of patients with administrative claims and EHR is impractical. Would FDA accept limited linked data if it was supplemental to cruder variables in the full dataset? Would a subset of 60% be adequate? In the context of probabilistic linkage, what level of certainty would constitute adequate linkage? Salience of linkable individuals to the specific research question should be considered in this determination and pre‐specified sensitivity analyses should help assess robustness of results and conclusions.9, 10

  • Abstract
  • 10.1002/alz70861_108648
Development and preliminary findings of a scoping review on the use of real‐world data in health services research for Alzheimer’s disease and related dementias
  • Dec 1, 2025
  • Alzheimer's & Dementia
  • Ashley Kuzmik + 7 more

BackgroundThe Alzheimer’s Association has prioritized advancing health services research (HSR) to improve care and outcomes for people living with Alzheimer’s disease and related dementias (AD/ADRD). HSR plays a critical role in investigating how care and treatment for AD/ADRD are accessed, delivered, and experienced. Real‐world data (RWD), including electronic health records (EHRs), claims, administrative, and registry data, offer opportunities to assess healthcare utilization, quality, and population‐level outcomes. This scoping review aims to map the use of RWD in AD/ADRD‐focused HSR in the United States and identify gaps and priorities to advance HSR and inform high‐value care. This review addresses the research question: How is RWD being used in recent HSR focused on AD/ADRD in the United States?MethodThis review follows the Arksey and O’Malley framework and PRISMA‐ScR guidelines. A protocol was developed with input from experts across healthcare delivery, policy, public health, and psychosocial research, shaping the research question, database selection, keywords, and coding framework. A systematic search (2020–present) was conducted across PubMed, Embase, CINAHL, Web of Science, and EconLit. Seven reviewers independently screened articles using Covidence. Thematic analysis will be conducted using MAXQDA, followed by a consensus‐building workshop with healthcare ecosystem representatives to refine findings and inform future research.ResultA total of 2,354 articles were identified; 599 full‐text studies were assessed for eligibility, and 520 were included. Data extraction is ongoing. Preliminary findings show variation in RWD (e.g., claims, EHRs, registries), study designs, and populations. Common topics include healthcare utilization, treatment outcomes, and health and social characteristics of people with AD/ADRD. Studies span from diagnosis to end‐of‐life care. Some studies used quasi‐experimental or machine‐learning methods and novel sources, such as clinical notes, to examine outcomes. Gaps include limited data integration, underrepresented populations, and inconsistent evaluation of care models.ConclusionThis review provides an overview of how RWD is used in AD/ADRD‐focused HSR and highlights where future work is needed. Findings will support stronger use of RWD in scientific studies, improve treatment and support for people living with dementia, and inform healthcare planning and decision‐making across settings and populations.

  • Research Article
  • Cite Count Icon 1
  • 10.1017/s0266462325103425
Use of RWD in the assessment of economic evaluations of innovative health products: lessons learned from the French National Authority for health (HAS).
  • Jan 5, 2026
  • International journal of technology assessment in health care
  • Salah Ghabri + 1 more

Our objective was to identify key patterns and discuss the lessons learned from the use of real-world data (RWD) in the cost-effectiveness analyses (CEAs) of innovative health products (IHPs) as assessed by the French National Authority for Health from January 2016 to May 2023. A retrospective analysis was conducted on the use of RWD in the CEAs of IHPs. Our material included HAS assessments of CEAs and manufacturers' technical reports. The RWD studies were classified into eight categories, and a specific template was constructed to report and discuss their use in terms of predefined methodological aspects. In all, 88 percent (129/147) of the CEAs integrated RWD studies. Retrospective cohorts were the most frequently used kind of study in the CEAs, while prospective cohorts were mainly used to identify the analyzed population and to externally validate models. We identified opposing temporal trends in the use of cohort studies versus registries. Approximately 8 percent (10/129) of the CEAs could be adjudged as invalidated due to major limitations regarding RWD use (e.g., lack of relative effectiveness). We learned several lessons from the use of RWD in the HAS assessments of the CEAs of IHPs. Retrospective cohort studies were the most commonly used RWD source to populate CEA parameters of the CEAs regardless of the type of IHP, and their use has increased over time. The implementation of good practices for the use of RWD studies should improve the role of RWD in economic modeling and address uncertainties surrounding CEAs.

  • Research Article
  • Cite Count Icon 1
  • 10.1093/bjd/ljae090.402
BT04 Exploring sustainability: the shift to digital patient information leaflets in dermatology
  • Jun 28, 2024
  • British Journal of Dermatology
  • Yasmin Nikookam + 2 more

The advent of digital health technologies has transformed the landscape of healthcare delivery and sustainable practice, with dermatology being no exception. National Health Service (NHS) England reported carbon emissions of 27.1 million tonnes from the health and social care system in 2017, equating to 6.3% of England’s emissions [NHS England. Delivering a ‘net zero’ National Health Service. Available at: https://www.england.nhs.uk/greenernhs/wp-content/uploads/sites/51/2020/10/delivering-a-net-zero-national-health-service.pdf (last accessed 19 March 2024)]. Subsequently, the initiation of the NHS campaign ‘For a Greener NHS’ was introduced to ensure stakeholders are mindful of their healthcare delivery. This project delves into the sustainability implications of transitioning from traditional paper-based patient information leaflets (PILs) to online platforms within the field of dermatology. Focusing on environmental, economic, patient and healthcare delivery perspectives, the project aims to elucidate the benefits and challenges associated with this digital shift. A quality improvement project was used to propose a new lean pathway for educating patients using digital PILs. Quantitative data assessed printing habits from the outpatient department printer over 2 weeks prior to any intervention. Printing habits were reassessed for a further 2 weeks after QR codes to PILs were left in the dermatology clinic rooms and clinicians were encouraged to avoid printing PILs where feasible. Carbon emissions and cost analyses were calculated to quantify the effect of digital PILs. Qualitative data through a survey disseminated to patients and clinicians assessed the impact of digital PILs on healthcare delivery, patient engagement, and information accessibility. The results demonstrated that over the first 2-week period 1723 pages were printed, which equated to 8615g CO2 equivalent. Implementation of the digital PILs resulted in 1596 pages printed, equating to 7980g CO2 equivalent. This equates to 16 510g of CO2 emissions and £317.50 in costs potentially saved annually when switching to digital PILs. This is equivalent to the CO2 emitted by fully charging 2068 smartphones [US Environmental Protection Agency. Greenhouse Gas Equivalencies Calculator. 2024. Available at: https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator (last accessed 19 March 2024)]. Preliminary qualitative results illustrated the reluctance of particular age groups to engage in digital PILs due to poor technology literacy. However, there was an awareness among patients and dermatologists to implement sustainable education. Our findings demonstrate that digital PILs foster a reduction in the carbon footprint and costs. The increasing prevalence of smartphones and internet connectivity gives digital platforms the potential to enhance patient education, engagement, and adherence to treatment plans. However, considerations related to digital literacy, accessibility and data security need to be addressed to ensure equitable healthcare delivery. Through a synthesis of environmental, economic and healthcare delivery perspectives, our findings aim to guide healthcare providers in making informed decisions regarding the adoption of digital platforms for sustainable patient information dissemination.

  • News Article
  • Cite Count Icon 1
  • 10.1016/s0140-6736(14)60120-3
Health on the agenda in Scottish independence referendum
  • Jan 30, 2014
  • The Lancet
  • Neil Bennet

Health on the agenda in Scottish independence referendum

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface