- New
- Discussion
- 10.1080/19466315.2025.2606333
- Dec 23, 2025
- Statistics in Biopharmaceutical Research
- Weidong Zhang + 8 more
: Overall survival (OS) is considered a gold standard clinical endpoint for evaluating the effectiveness of a drug. In oncology studies, OS is relatively simple and straightforward to measure in a clinical trial. Recent advancements in medical sciences and cancer care have significantly prolonged the lifespan of cancer patients. As a result, it is becoming challenging to measure OS due to its long course in some cancer diseases. In addition, it may be challenging to interpret OS in some circumstances; for example, when patients switch from the control arm to the experiment arm to receive a novel therapy, a detrimental effect might be observed in OS in the experimental arm with uncertainty, or discrepancy between OS and other clinical endpoints can be observed. This manuscript will provide an overview of the challenges of using OS as a clinical endpoint in pivotal oncology trials. Discussions will focus on using OS as both a safety and efficacy endpoint for decision-making.
- New
- Research Article
- 10.1080/19466315.2025.2579549
- Dec 23, 2025
- Statistics in Biopharmaceutical Research
- Rakhi Kilaru + 9 more
The recently released third draft version of ICH E6(R3) has a great emphasis on Risk-Based Quality Management (RBQM) principles and includes the concept of Quality Tolerance Limits (QTLs) that are regarded as an example of predefined acceptable ranges that, if exceeded, might potentially effect participants safety or the reliability of trial results. This change allows for greater flexibility and adaptability in managing quality and risks in clinical trials, leading to more effective and efficient trials. In this paper, we conduct simulations to evaluate statistical methods, including statistical process control and Bayesian methods, for implementing QTLs in clinical trials. We evaluate the operating characteristics such as average run length, alarm rate, false alarm rate, and other performance metrics. Generally, all methods performed better with larger sample sizes and higher expected probabilities. There was greater variability in performance across methods early in the review cycle when sample sizes were small. Statistical process control methods performed better in most scenarios, while Bayesian methods were more effective at detecting an out-of-control process earlier for lower expected probabilities. Not all scenarios could be investigated; thus, method selection depends on factors like assumptions, statistical complexity, and feasibility.
- New
- Research Article
- 10.1080/19466315.2025.2573322
- Dec 23, 2025
- Statistics in Biopharmaceutical Research
- Florian Lasch + 2 more
For handling intercurrent events in clinical trials, one of the strategies outlined in the ICH E9(R1) addendum targets a hypothetical scenario where an intercurrent event would not occur. While this strategy is often implemented by setting data after the intercurrent event to “missing” even if they have been collected, g-estimation allows for a more efficient estimation by using the information contained in post intercurrent event data. As the g-estimation methods have largely developed outside of randomized clinical trials, optimization for the application in clinical trials are possible. In this article, we describe and investigate the performance of modifications to the established g-estimation methods, leveraging the assumption that some intercurrent events are expected to have the same impact on the outcome regardless of the timing of their occurrence. In a simulation study in Alzheimer’s disease, the modifications show a substantial efficiency advantage for the estimation of an estimand that applies the hypothetical strategy to the use of symptomatic treatment while retaining approximate unbiasedness and adequate Type I error control.
- New
- Research Article
- 10.1080/19466315.2025.2579553
- Dec 21, 2025
- Statistics in Biopharmaceutical Research
- Minghua Shan
Confirmatory cancer clinical trials in chronic or indolent diseases often use imaging endpoints as a primary measure of efficacy due to relatively long survival time. Progression-free survival (PFS) is often such an image-based endpoint. A substantial increase in PFS in the absence of severe toxicities may be considered a meaningful clinical benefit. In these trials, overall survival (OS) is often a secondary or exploratory endpoint with low or unknown statistical power. Due to relatively long OS, few OS events occur at the time of a trial’s primary completion (e.g., the primary PFS analysis). Additionally, unlike the primary endpoint, analyses of OS are often not well planned and described in the protocol. All these make it challenging to interpret OS results in order to determine OS benefit or detriment. However, OS is an ultimate measure of safety as well as efficacy. We present two methods for planning analyses of OS for safety evaluation: a three-outcome and a two-outcome procedure. They can be used to plan OS safety analyses so that sufficient data are available to provide at least a minimum level of information required to rule out a substantial detriment. They also provide guidelines for interpreting OS results.
- Research Article
- 10.1080/19466315.2025.2602449
- Dec 10, 2025
- Statistics in Biopharmaceutical Research
- Jonathon Vallejo + 1 more
Interim monitoring is important in mitigating risk in clinical trials. In oncology, there has been increasing interest in monitoring potential detrimental treatment effects, particularly on overall survival (OS). However, little research exists on operating characteristics of interim monitoring when harm is present. It is unclear how early one might need to monitor for harm and with what threshold. To address these questions, we calculated optimal interim timings and boundaries through a constrained optimization framework given a fixed power loss when harm is present. Our simulations focus on monitoring oncology trials with OS as the primary endpoint. We explored various optimization criteria including minimizing harm, futility, or a combination of these outcomes given multiple levels of power loss. Simulation results for single and multiple interim analyses are provided. Results suggest that the initial interim necessary for mitigating harm is often earlier than usually implemented (e.g., 20% information fraction) and that the threshold for such a look can be more aggressive than typically considered. Implementation of such boundaries are subject to practical considerations, which are discussed throughout.
- Research Article
- 10.1080/19466315.2025.2581125
- Nov 26, 2025
- Statistics in Biopharmaceutical Research
- Yunlong Yang + 2 more
The primary objective of Phase I oncology trials is to assess the safety and tolerability of novel therapeutics. Conventional dose escalation methods identify the maximum tolerated dose (MTD) based on dose-limiting toxicity (DLT). However, as cancer therapies have evolved from chemotherapy to targeted therapies, these traditional methods have become problematic. Many targeted therapies rarely produce DLT and are administered over multiple cycles, potentially resulting in the accumulation of lower-grade toxicities, which can lead to intolerance, such as dose reduction or interruption. To address this issue, we proposed dual-criterion designs that find the MTD based on both DLT and non-DLT-caused intolerance. We considered the model-based design and model-assisted design that allow real-time decision-making in the presence of pending data due to long event assessment windows. Compared to DLT-based methods, our approaches exhibit superior operating characteristics when intolerance is the primary driver for determining the MTD and comparable operating characteristics when DLT is the primary driver.
- Research Article
- 10.1080/19466315.2025.2581123
- Nov 26, 2025
- Statistics in Biopharmaceutical Research
- Arnab Kumar Maity
In an Oncology Phase I trial where the primary objective is to find the maximum tolerated dose (MTD) or the recommended phase II dose (RP2D), a common practice is to classify the toxicities into two grades, one of which is known as dose limiting toxicity (DLT) and the other one is not DLT. The dose escalation process is carried out by testing each dose into few participants (usually 3). This process can be guided by any of the statistical methods such as 3 + 3, continuous reassessment method (CRM), or the Bayesian logistic regression model (BLRM). Note that, the current guidelines by Common Toxicity Criteria (CTCAE) classify the adverse events (AE) into 5 grades ranging from mild AE to death related to an AE. When transferring from these categories into two categories viz. DLT or no DLT, then a significant information is lost. This can be mitigated by fitting an ordinal model into the data which is equivalent to develop an ordinal CRM method. This work proposes a Bayesian ordinal CRM method which can be used to compute the AE grade probabilities at each dose level. The resulting information then can be communicated to the investigating study team to decide on the next dose in a more effective manner. The developed Bayesian method is described using a simulation study and real data case studies. The full implementation is wrapped into an R package BayesOrdCRM available from GitHUB.
- Research Article
- 10.1080/19466315.2025.2565155
- Nov 19, 2025
- Statistics in Biopharmaceutical Research
- Kaifeng Lu + 6 more
Overall survival (OS) is the gold standard in cancer drug development for both efficacy and safety. In indolent cancers where patients live a long time, it may be infeasible to set up a phase 3 study powered for OS and it poses challenges as how to evaluate OS and monitor the potential OS detrimental effect. This article proposes a pragmatic framework for monitoring OS for potential detriment in indolent cancers, by extending existing methods to incorporate calendar-based monitoring and prediction, and reducing variabilities in model-based predictions. The proposed method can also handle nonproportional hazards and has high flexibilities in the choice of models for the distributions of the events and drop-outs. Practical considerations are provided on the selection of models. The framework is illustrated with an example with practical use.
- Research Article
- 10.1080/19466315.2025.2590682
- Nov 17, 2025
- Statistics in Biopharmaceutical Research
- Feng Tian + 4 more
Ensuring diversity in clinical trials is critical for understanding treatment effects across different populations. This paper explores innovative statistical strategies to enhance the representation of underrepresented racial and ethnic groups in clinical research. We review Bayesian borrowing methods in single-arm trials, emphasizing their potential to leverage historical data or real-world data (RWD)while addressing risks of bias. In the context of randomized clinical trials (RCTs), we discuss adaptive enrichment and hybrid designs as approaches to mitigate demographic disparities while maintaining scientific rigor. Beyond trial design innovations, the integration of RWD offers opportunities to supplement evidence and improve inclusivity. However, challenges such as data quality, selection bias, and endpoint comparability must be carefully addressed. We present a hypothetical case study demonstrating Bayesian borrowing in a post-market setting to illustrate its practical implications.
- Research Article
- 10.1080/19466315.2025.2587046
- Nov 13, 2025
- Statistics in Biopharmaceutical Research
- Miao Yang + 2 more
Event size re-estimation (ESR) is a natural extension of sample size re-estimation (SSR) to clinical trials with a time-to-event endpoint. Even though the same Type I error approaches are shared between ESR and SSR, the survival endpoint is more complicated than continuous and binary ones. We look into all the popular methods to control Type I error rate under ESR. Moreover, we propose the specification of incorporating stratification factors into the combination test for clinical trials with data from different stages. The properties of all the above methods are thoroughly studied and discussed.