BackgroundNovel precision medicine therapeutics target increasingly granular, genomically-defined populations. Rare sub-groups make it challenging to study within a clinical trial or single real-world data (RWD) source; therefore, pooling from disparate sources of RWD may be required for feasibility. Heterogeneity assessment for pooled data is particularly complex when contrasting a pooled real-world comparator cohort (rwCC) with a single-arm clinical trial (SAT), because the individual comparisons are not independent as all compare a rwCC to the same SAT. Our objective was to develop a methodological framework for pooling RWD focused on the rwCC use case, and simulate novel approaches of heterogeneity assessment, especially for small datasets.MethodsWe present a framework with the following steps: pre-specification, assessment of dataset eligibility, and outcome analyses (including assessment of outcome heterogeneity). We then simulated heterogeneity assessments for a binary response outcome in a SAT compared to two rwCCs, using standard methods for meta-analysis, and an Adjusted Cochran’s Q test, and directly comparing the individual participant data (IPD) from the rwCCs.ResultsWe found identical power to detect a true difference for the adjusted Cochran’s Q test and the IPD method, with both approaches superior to a standard Cochran’s Q test. When assessing the impact of heterogeneity in the null scenario of no difference between the SAT and rwCCs, a lack of statistical power led to Type 1 error inflation. Similarly, in the alternative scenario of a true difference between SAT and rwCCs, we found substantial Type 2 error, with underpowered heterogeneity testing leading to underestimation of the treatment effect.ConclusionsWe developed a methodological framework for pooling RWD sources in the context of designing a rwCC for a SAT. When testing for heterogeneity during this process, the adjusted Cochran’s Q test matches the statistical power of IPD heterogeneity testing. Limitations of quantitative heterogeneity testing in protecting against Type 1 or Type 2 error indicate these tests are best used descriptively, and after careful selection of datasets based on clinical/data considerations. We hope these findings will facilitate the rigorous pooling of RWD to unlock insights to benefit oncology patients.
Read full abstract