Articles published on Use Of External Data
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
56 Search results
Sort by Recency
- Research Article
- 10.1080/19466315.2026.2646537
- Mar 24, 2026
- Statistics in Biopharmaceutical Research
- Vivienn Weru + 4 more
ABSTRACT The formal use of external data can increase the efficiency of clinical trial designs, enabling smaller sample sizes in settings where recruitment is challenging. In the Bayesian framework, such borrowing is achieved through informative priors, but substantial inconsistency (“drift”) between external and current trial data can compromise inference. Robust mixture priors address this risk by blending an informative prior with a diffuse component, allowing dynamic borrowing that adapts to the degree of drift. Yet guidance on choosing the mixture’s form, mean, variance, and weight remains limited. We study how these tuning parameters affect inference in one-arm and hybrid-control trials with normally distributed endpoints. Because the robust component is intended to protect against prior misspecification under drift—and these parameters determine the prior’s behavior across drift scenarios—we assess their impact on key operating characteristics. All four quantities substantially influence Type I error, power, and estimation accuracy. As expected, the robust component’s variance governs robustness, but its location has a strong and often underappreciated effect on testing and estimation error. Moreover, the mixture weight interacts closely with the robust component’s location and variance. We provide practical recommendations for selecting robust component parameters, mixture weights, alternative functional forms, and strategies for evaluating operating characteristics.
- Research Article
- 10.1177/17407745251385535
- Nov 5, 2025
- Clinical trials (London, England)
- Isaac J Egesa + 4 more
The Sequential Multiple-Assignment Randomised Trial (SMART) design is considered the gold standard for developing adaptive interventions, which tailor treatments to individual patient characteristics and responses. While SMART offers a rigorous framework aligned with real-world clinical decision-making, it is often complex, time-consuming, and costly. As interest in SMART design grows, there is increasing recognition for the need to improve its implementation through more explicit guidance and best practices. Efficiency gains may also be possible by incorporating external data to inform their design, conduct, and analysis. This review aimed to identify all published trials using the SMART design, summarise their design, conduct, and reporting practices and evaluate the use of external data in their implementation. We searched PubMed, Medline, PsycINFO, Scopus, and Web of Science databases for all SMART up to June 30, 2024. External data were defined as non-simulated individual patient data collected outside the main SMART to supplement or inform the main trial. We included 80 SMART, of which 35 (44%) were completed and 45 (56%) were ongoing. Most trials reported two phases of randomisation (93%), with the primary aim focusing on evaluating main effects (81%) of interventions at the first stage of randomisation. There was inadequate reporting of several key aspects, including sample size estimation, statistical analysis software, allocation concealment, data missingness, multiple testing, sensitivity analysis, and the use of SMART in the title. Seventeen (21%) SMART (4-completed trials and 13-trial protocols) referred to the use of external data from electronic health records (n = 12) and registries (n = 5). External data was used for recruitment (n = 11), outcome measures (n = 6), and to provide baseline covariate information (n = 1). SMART designs are increasingly used to develop adaptive interventions across diverse clinical contexts, yet key methodological features and basic components remain inconsistently reported. This limits transparency, reproducibility, and potential for translation into routine care. Although external data are widely used in standard randomised controlled trials, their use in the SMART is still limited, likely due to methodological and infrastructural challenges and the absence of tailored reporting standards. To improve the efficiency and generalisability of SMART designs, expert-led extensions of CONSORT and SPIRIT guidelines are needed, including specific recommendations for reporting external data use. Future research should explore optimal external data sources for informing SMART components and promote interdisciplinary collaboration and training to support high-quality implementation.
- Research Article
- 10.1177/09622802251367439
- Sep 15, 2025
- Statistical methods in medical research
- Xuetao Lu + 1 more
The use of external data in clinical trials offers numerous advantages, such as reducing enrollment, increasing study power, and shortening trial duration. In Bayesian inference, information in external data can be transferred into an informative prior for future borrowing (i.e. prior synthesis). However, multisource external data often exhibits heterogeneity, which can cause information distortion during the prior synthesizing. Clustering helps identifying the heterogeneity, enhancing the congruence between synthesized prior and external data. Obtaining optimal clustering is challenging due to the trade-off between congruence with external data and robustness to future data. We introduce two overlapping indices: the overlapping clustering index and the overlapping evidence index . Using these indices alongside a K-means algorithm, the optimal clustering result can be identified by balancing this trade-off and applied to construct a prior synthesis framework to effectively borrow information from multisource external data. By incorporating the (robust) meta-analytic predictive (MAP) prior within this framework, we develop (robust) Bayesian clustering MAP priors. Simulation studies and real-data analysis demonstrate their advantages over commonly used priors in the presence of heterogeneity. Since the Bayesian clustering priors are constructed without needing the data from prospective study, they can be applied to both study design and data analysis in clinical trials.
- Research Article
- 10.1002/cpt.70010
- Jul 31, 2025
- Clinical pharmacology and therapeutics
- Yuanyuan Zhao + 11 more
Traditional randomized controlled trials (RCTs) face increasing challenges due to lengthy recruitment and high costs. Regulators have encouraged the use of external data and real-world evidence (RWE) to improve efficiency, yet adoption in confirmatory settings remains limited by concerns over heterogeneity and bias. We conducted a proof-of-concept study to assess the feasibility and regulatory value of a hybrid Bayesian borrowing design to support a Phase III RCT of Dexamethasone Intracameral Drug-Delivery Suspension (DEXYCU) in China. Using the Equivalence Probability Propensity Score Meta-Analytic-Predictive (EQPSMAP) approach, we integrated three data sources-a global RCT, a regional Phase III RCT in China, and a real-world data (RWD) in China. The method's performance was evaluated via point estimates and 95% credible intervals for the primary efficacy endpoint. The hybrid design based on EQPSMAP demonstrated greater robustness and accuracy in the presence of baseline imbalances and heterogeneous data. Compared to a traditional RCT, the hybrid design reduced the required sample size by 41 to 158 patients and shortened trial duration by approximately 2 to 5 months while preserving internal validity. This study demonstrated the feasibility and regulatory value of hybrid Bayesian designs in late-phase trials. The approach offers a practical, bias-controlled framework for integrating external data into regional drug development and regulatory decision making.
- Research Article
- 10.1093/biomet/asaf047
- Jul 10, 2025
- Biometrika
- B Ren + 4 more
In oncology the efficacy of novel therapeutics often differs across patient subgroups, and these variations are difficult to predict during the initial phases of the drug development process. The relation between the power of randomized clinical trials and heterogeneous treatment effects has been discussed by several authors. In particular, false negative results are likely to occur when the treatment effects concentrate in a subpopulation but the study design did not account for potential heterogeneous treatment effects. The use of external data from completed clinical studies and electronic health records has the potential to improve decision-making throughout the development of new therapeutics, from early-stage trials to registration. Here we discuss the use of external data to evaluate experimental treatments with potential heterogeneous treatment effects. We introduce a permutation procedure to test, at the completion of a randomized clinical trial, the null hypothesis that the experimental therapy does not improve the primary outcomes in any subpopulation. The permutation test leverages the available external data to increase power. Also, the procedure controls the false positive rate at the desired -level without restrictive assumptions on the external data, for example, in scenarios with unmeasured confounders, different pre-treatment patient profiles in the trial population compared to the external data, and other discrepancies between the trial and the external data. We illustrate that the permutation test is optimal according to an interpretable criteria and discuss examples based on asymptotic results and simulations, followed by a retrospective analysis of individual patient-level data from a collection of glioblastoma clinical trials.
- Research Article
- 10.1016/j.blre.2025.101324
- Jul 1, 2025
- Blood reviews
- Subodh Selukar + 2 more
Synthetic control arms and other uses of external data in clinical trials for hematological malignancies.
- Research Article
2
- 10.1080/13696998.2025.2506968
- May 17, 2025
- Journal of Medical Economics
- Audrey Petitjean + 3 more
Aim To assess use of external evidence for overall survival (OS) estimation in oncology single-technology appraisals (STAs) by the National Institute for Health and Care Excellence (NICE). Methods STAs for oncology drugs appraised by NICE between January 2021 and March 2023 were identified. For each eligible STA, OS extrapolation methods used, the rationale for using external data, the source and type of data, and information on acceptance by the evidence review group (ERG) and the appraisal committee were extracted. Results Initially, 215 STAs were identified, of which 82 were eligible for the study. Of these, 32 STAs used external data for OS extrapolation, including trial data (44%), real-world data (47%), clinical opinion (25%), meta-analysis (1%) and previous STA (1%). External data were used more frequently in state-transition models for post-event transitions and cure assumptions, and in partitioned-survival models to replace pivotal trial OS, inform long-term survival estimates or to estimate OS based on surrogacy analysis. Sensitivity analyses on use of external data was explored in 16 (50%) of the STAs. The committee accepted use of external data in half of the analysed STAs, acknowledging uncertainty in OS extrapolation. Limitations The analysis was limited to the STAs published between 2021 and 2023 and publicly available materials on the NICE website. Conclusion This study provides an overview of external data used to estimate OS in oncology STAs conducted by NICE in recent years. External data, including trial data, real-world data and clinical opinions, were incorporated into recent oncology STAs at various modelling stages. ERGs and appraisal committees were generally accepting of the use of external data. However, it is crucial to conduct a sensitivity analysis and provide a justification for the methods and data source selection.
- Research Article
- 10.62123/enigma.v2i2.55
- Apr 11, 2025
- Electronic Integrated Computer Algorithm Journal
- Fitria Widianingsih + 1 more
This study aims to integrate Artificial Intelligence (AI) and Machine Learning (ML) technologies with Collaborative Filtering (CF) to build a more accurate and personalized movie recommendation system. This system uses the Singular Value Decomposition (SVD) algorithm to reduce the dimensionality of data and generate rating predictions for users of movies they have not watched. This study implements a dataset from MovieLens to test the effectiveness of the model in providing recommendations. The experimental results show that the system successfully predicts user ratings with fairly high accuracy, reflected in the average Root Mean Square Error (RMSE) value of 0.85 for the five users tested. Although these results show good performance, challenges such as cold start problems and data sparsity are still major obstacles in producing more optimal recommendations. Therefore, this study also proposes the use of hybrid filtering, deep learning, and the use of external data to improve prediction accuracy and overcome these limitations.
- Research Article
1
- 10.1080/19466315.2025.2455178
- Mar 10, 2025
- Statistics in Biopharmaceutical Research
- Kristine Broglio + 4 more
Some populations, such as rare diseases, cannot be studied in randomized clinical trials due to feasibility or ethical considerations. One way to address this is to incorporate external data to augment what is learned in the trial about a standard of care control arm. Statistically, there are two broad families of approaches for incorporating external controls for comparisons, propensity score-based methods (PSM) and Bayesian dynamic borrowing (BDB) methods. We evaluate these methods in terms of bias and precision with patient-level data from a large cardiovascular trial. We consider a hybrid trial setting, where external data augments a concurrently randomized control arm, and a single-arm trial using external data as a formal comparator. We evaluate performance with and without systemic biases between the trial and the external controls. The performance of PSM depends on the extent to which covariates are associated with the outcome. The performance of BDB depends on the choice of modeling parameters. Overall, there is a precision-bias tradeoff in the use of external data. In practice, it is appropriate that the observed treatment effect needs to not just achieve statistical significance, but also qualitatively overwhelm the possibility that the observed treatment effect is driven by systematic differences between data sources.
- Research Article
2
- 10.1016/j.esmoop.2024.104094
- Jan 1, 2025
- ESMO Open
- Tulika Rudra Gupta + 5 more
Informative Censoring in Externally Controlled Clinical Trials: A Potential Source of Bias
- Abstract
- 10.1016/j.jval.2023.09.1773
- Dec 1, 2023
- Value in Health
- F Reitsma + 1 more
HTA89 An Increased Use of External Data to Inform Survival Extrapolations in NICE Technology Appraisals
- Research Article
5
- 10.1016/j.jval.2023.10.003
- Oct 17, 2023
- Value in Health
- Sangyu Lee + 4 more
ObjectivesParametric models are used to estimate the lifetime benefit of an intervention beyond the range of trial follow-up. Recent recommendations have suggested more flexible survival approaches and the use of external data when extrapolating. Both of these can be realized by using flexible parametric relative survival modeling. The overall aim of this article is to introduce and contrast various approaches for applying constraints on the long-term disease-related (excess) mortality including cure models and evaluate the consequent implications for extrapolation. MethodsWe describe flexible parametric relative survival modeling approaches. We then introduce various options for constraining the long-term excess mortality and compare the performance of each method in simulated data. These methods include fitting a standard flexible parametric relative survival model, enforcing statistical cure, and forcing the long-term excess mortality to converge to a constant. We simulate various scenarios, including where statistical cure is reasonable and where the long-term excess mortality persists. ResultsThe compared approaches showed similar survival fits within the follow-up period. However, when extrapolating the all-cause survival beyond trial follow-up, there is variation depending on the assumption made about the long-term excess mortality. Altering the time point from which the excess mortality is constrained enables further flexibility. ConclusionsThe various constraints can lead to applying explicit assumptions when extrapolating, which could lead to more plausible survival extrapolations. The inclusion of general population mortality directly into the model-building process, which is possible for all considered approaches, should be adopted more widely in survival extrapolation in health technology assessment.
- Research Article
7
- 10.1158/1078-0432.ccr-22-3524
- Mar 20, 2023
- Clinical cancer research : an official journal of the American Association for Cancer Research
- Rifaquat Rahman + 6 more
Drug development can be associated with slow timelines, particularly for rare or difficult-to-treat solid tumors such as glioblastoma. The use of external data in the design and analysis of trials has attracted significant interest because it has the potential to improve the efficiency and precision of drug development. A recurring challenge in the use of external data for clinical trials, however, is the difficulty in accessing high-quality patient-level data. Academic research groups generally do not have access to suitable datasets to effectively leverage external data for planning and analyses of new clinical trials. Given the need for resources to enable investigators to benefit from existing data assets, we have developed the Glioblastoma External (GBM-X) Data Platform which will allow investigators in neuro-oncology to leverage our data collection and obtain analyses. GBM-X strives to provide an uncomplicated process to use external data, contextualize single-arm trials, and improve inference on treatment effects early in drug development. The platform is designed to welcome interested collaborators and integrate new data into the platform, with the expectation that the data collection can continue to grow and remain updated. With such features, GBM-X is designed to help to accelerate evaluation of therapies, to grow with collaborations, and to serve as a model to improve drug discovery for rare and difficult-to-treat tumors in oncology.
- Research Article
8
- 10.1007/s11042-023-14981-2
- Mar 11, 2023
- Multimedia Tools and Applications
- Sruthy Manmadhan + 1 more
The goal of medical visual question answering (Med-VQA) is to correctly answer a clinical question posed by a medical image. Medical images are fundamentally different from images in the general domain. As a result, using general domain Visual Question Answering (VQA) models to the medical domain is impossible. Furthermore, the large-scale data required by VQA models is rarely available in the medical arena. Existing approaches of medical visual question answering often rely on transfer learning with external data to generate good image feature representation and use cross-modal fusion of visual and language features to acclimate to the lack of labelled data. This research provides a new parallel multi-head attention framework (MaMVQA) for dealing with Med-VQA without the use of external data. The proposed framework addresses image feature extraction using the unsupervised Denoising Auto-Encoder (DAE) and language feature extraction using term-weighted question embedding. In addition, we present qf-MI, a unique supervised term-weighting (STW) scheme based on the concept of mutual information (MI) between the word and the corresponding class label. Extensive experimental findings on the VQA-RAD public medical VQA benchmark show that the proposed methodology outperforms previous state-of-the-art methods in terms of accuracy while requiring no external data to train the model. Remarkably, the presented MaMVQA model achieved significantly increased accuracy in predicting answers to both close-ended (78.68%) and open-ended (55.31%) questions. Also, an extensive set of ablations are studied to demonstrate the significance of individual components of the system.
- Research Article
4
- 10.1200/po.22.00606
- Feb 1, 2023
- JCO precision oncology
- Alejandra Avalos-Pacheco + 6 more
Adaptive clinical trials use algorithms to predict, during the study, patient outcomes and final study results. These predictions trigger interim decisions, such as early discontinuation of the trial, and can change the course of the study. Poor selection of the Prediction Analyses and Interim Decisions (PAID) plan in an adaptive clinical trial can have negative consequences, including the risk of exposing patients to ineffective or toxic treatments. We present an approach that leverages data sets from completed trials to evaluate and compare candidate PAIDs using interpretable validation metrics. The goal is to determine whether and how to incorporate predictions into major interim decisions in a clinical trial. Candidate PAIDs can differ in several aspects, such as the prediction models used, timing of interim analyses, and potential use of external data sets. To illustrate our approach, we considered a randomized clinical trial in glioblastoma. The study design includes interim futility analyses on the basis of the predictive probability that the final analysis, at the completion of the study, will provide significant evidence of treatment effects. We examined various PAIDs with different levels of complexity to investigate if the use of biomarkers, external data, or novel algorithms improved interim decisions in the glioblastoma clinical trial. Validation analyses on the basis of completed trials and electronic health records support the selection of algorithms, predictive models, and other aspects of PAIDs for use in adaptive clinical trials. By contrast, PAID evaluations on the basis of arbitrarily defined ad hoc simulation scenarios, which are not tailored to previous clinical data and experience, tend to overvalue complex prediction procedures and produce poor estimates of trial operating characteristics such as power and the number of enrolled patients. Validation analyses on the basis of completed trials and real world data support the selection of predictive models, interim analysis rules, and other aspects of PAIDs in future clinical trials.
- Research Article
19
- 10.1016/j.trc.2022.103946
- Nov 24, 2022
- Transportation Research Part C: Emerging Technologies
- Mathias Niemann Tygesen + 2 more
Predicting the supply and demand of transport systems is vital for efficient traffic management, control, optimization, and planning. For example, predicting where from/to and when people intend to travel by taxi can support fleet managers in distributing resources; better predictions of traffic speeds/congestion allows for pro-active control measures or for users to better choose their paths. Making spatio-temporal predictions is known to be a hard task, but recently Graph Neural Networks (GNNs) have been widely applied on non-Euclidean spatial data. However, most GNN models require a predefined graph, and so far, researchers rely on heuristics to generate this graph for the model to use. In this paper, we use Neural Relational Inference to learn the optimal graph for the model. Our approach has several advantages: 1) a Variational Auto Encoder structure allows for the graph to be dynamically determined by the data, potentially changing through time; 2) the encoder structure allows the use of external data in the generation of the graph; 3) it is possible to place Bayesian priors on the generated graphs to encode domain knowledge. We conduct experiments on two datasets, namely the NYC Yellow Taxi and the PEMS-BAY road traffic datasets. In both datasets, we outperform benchmarks and show performance comparable to state-of-the-art. Furthermore, we do an in-depth analysis of the learned graphs, providing insights on what kinds of connections GNNs use for spatio-temporal predictions in the transport domain and how these connections can help interpretability.
- Research Article
2
- 10.1093/comnet/cnac046
- Oct 27, 2022
- Journal of Complex Networks
- Patience Pokuaa Gambrah + 1 more
Abstract An individual’s productivity is strongly related to work- and non-work-related interactions. Thus, the literature on farmers’ productivity often explores single-layer networks that illustrate the single categories of social relationships. In this study, we investigated farmers’ productivity using a multiplex structure underlying social interaction networks. Relational data were obtained from farmers in four different categories of social relationships. The multiplex network was analysed by applying multiplex degree centrality and layer-by-layer comparison. Also, power and role were analysed through the use of external data by determining their intra-layer correlation. The findings show that diverse types of relationships exist together and they positively affect farmers’ productivity in multiple ways and enhance their innovation capacity. Only 6 out of the 73 farmers had high-degree centrality (> 10), with 18–63$\%$ relevance for the six farmers in the two layers—farming advice (FA) and loans (LO) layers—that the farmers considered important to their productivity. These farmers were more likely to be productive and help improve the productivity of others linked to them. Further, 62$\%$ of the edges in the social gathering and personal advice layers were similar, whereas only 3$\%$ of those in the FA and LO layers were similar, confirming the significance of the latter layers. The influence of social structures on farmers’ productivity implies that social connections enhance farmers’ confidence. The external data further confirm that the formation of some links depends on trust and power, whereas others do not.
- Research Article
22
- 10.1007/s40273-022-01164-4
- Jul 12, 2022
- PharmacoEconomics
- Becky Pennington + 3 more
Including health outcomes for carers as well as patients in economic evaluations can change the results and conclusions of the analysis. Whilst in many disease areas there can be clear justification for including carers’ health-related quality of life (HRQL) in health technology assessments (HTAs), we believe that, in general, the perspective of carers is under-represented in HTA. We were interested in the extent, and methods by which, HTA bodies include carers’ HRQL in economic evaluation. We reviewed guidance from 13 HTA bodies across the world regarding carers’ HRQL. We examined five interventions, as case studies, assessed by different HTA bodies, and extracted information on whether carers’ HRQL was included by the manufacturers or assessors in their dossiers of evidence, the data and methods used, and the impact on the results. We developed recommendations to guide analysts on including carers’ HRQL in economic evaluations. When reviewing the methods guides: two bodies recommend including carers’ HRQL in the base case, two referred to outcomes for all individuals, two preferred to exclude carers, three said it depended on other conditions, and it was unclear for four. Across the five case studies: five source studies for carers’ HRQL and two different modelling approaches were used. Including carers’ HRQL increased incremental quality-adjusted life-years (QALYs) in 19/23 analyses (decreased it in two); there was substantial variation in the magnitude of change. We recommend: (1) the inclusion of carers is clearly justified, (2) the use of HRQL data from the population under comparison where possible, (3) the use of data from another disease area or country is clearly justified (and transferability/applicability issues are discussed), (4) the use of external data to derive comparisons for cross-sectional data is justified, (5) assumptions and implications of the modelling approach are explicit, and (6) disaggregated results for patients and carers are presented.
- Research Article
2
- 10.1080/10543406.2022.2078346
- May 4, 2022
- Journal of biopharmaceutical statistics
- Evan Kwiatkowski + 3 more
ABSTRACT We present a Bayesian framework for sequential monitoring that allows for use of external data, and that can be applied in a wide range of clinical trial applications. The basis for this framework is the idea that, in many cases, specification of priors used for sequential monitoring and the stopping criteria can be semi-algorithmic byproducts of the trial hypotheses and relevant external data, simplifying the process of prior elicitation. Monitoring priors are defined using the family of generalized normal distributions, which comprise a flexible class of priors, naturally allowing one to construct a prior that is peaked or flat about the parameter values thought to be most likely. External data are incorporated into the monitoring process through mixing an a priori skeptical prior with an enthusiastic prior using a weight that can be fixed or adaptively estimated. In particular, we introduce an adaptive monitoring prior for efficacy evaluation that dynamically weighs skeptical and enthusiastic prior components based on the degree to which observed data are consistent with an enthusiastic perspective. The proposed approach allows for prospective and pre-specified use of external data in the monitoring procedure. We illustrate the method for both single-arm and two-arm randomized controlled trials. For the latter case, we also include a retrospective analysis of actual trial data using the proposed adaptive sequential monitoring procedure. Both examples are motivated by completed pediatric trials, and the designs incorporate information from adult trials to varying degrees. Preposterior analysis and frequentist operating characteristics of each trial design are discussed.
- Research Article
15
- 10.1080/10543406.2021.2021227
- Jan 19, 2022
- Journal of Biopharmaceutical Statistics
- Hongfei Li + 2 more
ABSTRACT Utilizing external data from the real world, including data from historical clinical trials, has received increasing interest in drug development. The use of external data to support drug evaluation in clinical trials has mainly been through using various matching methods for baseline characteristics to form external control arms in single-arm trials or to augment control arms of randomized controlled trials in hybrid approaches. However, matching the baseline characteristics between the trial and the external subjects can only guarantee comparability on the level of baseline characteristics. Differences in outcomes between the two data sources may still exist due to contemporaneous and operational characteristics. Similarity between the outcomes in the trial control and the external subjects with similar baseline characteristics can be critical in leveraging the external subjects in the clinical trials. In this paper, a resampling method for augmenting control arms in randomized controlled trials is proposed under the conditional borrowing framework. The new method establishes empirical distributions for the hazard ratio in outcomes between the external and trial control subjects. The borrowing decision is then derived from this empirical distribution using a measure of similarity. Once the borrowing decision is established, the borrowing weights for the external subjects, based on the similarity measure, are incorporated in the weighted partial likelihood to evaluate the treatment effect. The operating characteristics of the hybrid control arm, under both the conditional borrowing and unconditional borrowing frameworks, are evaluated. Simulation is conducted to evaluate Type I error, bias, and power. An illustrative example using simulated data is also presented.