Statistical Sampling Approach Research Articles

The 21st Century Cures Act of 2016 provided a framework to the US Food and Drug Administration (FDA) to rapidly move treatments to patients.1 The increased acceptability of real‐world data (RWD) sources allows for innovative ways to study products and has the potential to reduce trial costs. Published papers provide guidance regarding data quality issues, reproducibility, and validity assessment.2 Rapid evolvement of electronic health records (EHRs) encourages greater consideration of their use in research.1, 2, 3, 4, 5, 6 For years, the FDA has relied on epidemiological studies of postapproval product safety using RWD5, 6 (eg, administrative claims and EHR) and for device effectiveness studies4; however, regulatory use for evaluating drug effectiveness has been rare. As part of the Prescription Drug User Fee Act (PDUFA VI),3 use of RWD is being considered for potential contributions to evaluating effectiveness and safety of new indications for approved products and to satisfy postapproval study requirements. Recently, the Duke Margolis Center for Health Policy held workshops and issued two paper on this topic.5, 6 The first paper focused on defining RWD as data routinely collected pertinent to patient health status and/or delivery of care, and the use of RWD in regulatory and clinical contexts.5 The second white paper from the October 1, 2018, workshop focused on data relevancy and quality, including cleaning, transforming, and linking RWD to characterize RWD sources as “fit for regulatory purpose.”6 These papers offer a practical “commonsense” high‐level view of primary data and methods considerations for RWD use from a regulatory perspective, facilitating discussion around regulatory uses of RWD within the research community and industry. However, salient points are missing from the papers and the RWD discussions among FDA, researchers, and industry. Here, we provide a commentary on the data considerations discussed in the white papers and highlight pertinent considerations with respect to RWD in the context of whether data are relevant, representative, and robust. 1.1. Data relevance The recent white paper defines data relevance dimensions including representativeness of the population of interest, critical data field availability, accurate linking at the patient level with multiple data sources, and adequate sample size and follow‐up time to demonstrate expected treatment effects.6 Guidance from FDA on how to ensure RWD are fit for purpose and adequate to support regulatory decisions would be helpful on each dimension. Determining if RWD is fit for regulatory purpose is a “contextual exercise” where the specific research question, regulatory use, and data characteristics drive what meaningful conclusions can be drawn.6 Covariates may be critical for one research question but not another. Exposures and outcomes should be well defined when part of the research question but may not be critical for natural history studies. There is no “one‐size‐fits‐all” approach, and critical data components should be evaluated for each research question and regulatory use.7 A framework is needed to guide choice and evaluation of critical data elements for specific research questions for regulatory use. Representativeness of the population of interest is gauged in many ways. Recent FDA guidance on Patient Focused Drug Development suggests a statistical sampling approach be used to obtain patient experience data representative of the target population.8 However, most US real‐world databases use administrative claims or EHR for patients seeking medical attention. These RWD sources should be considered broadly representative of the population eligible for using most, if not all, new products and services. “Representativeness” should be assessed broadly in the context of likely product users with some diversity in geography, health status, and health care system as appropriate for the specific research question and regulatory context. While data linkage is likely to limit the eligible sample, it may be needed to increase the informative nature of RWD, especially with increasing evaluations to support precision medicine. Sample size should be derived based on anticipated treatment effects for studies of treatment effectiveness or safety, whether comparative or not, to ensure appropriate precision of estimates. For rare diseases, there should be flexibility given data sparseness worldwide, as indicated in the FDA guidance on rare disease.8 Additional guidance would be useful regarding how “accurate linking” should be assessed since linking 100% of patients with administrative claims and EHR is impractical. Would FDA accept limited linked data if it was supplemental to cruder variables in the full dataset? Would a subset of 60% be adequate? In the context of probabilistic linkage, what level of certainty would constitute adequate linkage? Salience of linkable individuals to the specific research question should be considered in this determination and pre‐specified sensitivity analyses should help assess robustness of results and conclusions.9, 10

Read full abstract

BackgroundOver the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other.ResultsIn this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results.ConclusionsOur experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

Read full abstract

Statistical Sampling Approach Research Articles

Related Topics

Articles published on Statistical Sampling Approach

Constrained Hybrid Monte Carlo Sampling Made Simple for Chemical Reaction Simulations.

Benchmarking laser scanning and terrestrial photogrammetry to extract forest inventory parameters in a complex temperate forest

Passive sampling in reproducing kernel Hilbert spaces using leverage scores

The Early Steps of Molecule-to-Material Conversion in Chemical Vapor Deposition (CVD): A Case Study.

Considerations in characterizing real-world data relevance and quality for regulatory purposes: A commentary.

Uncertainty quantification for radio interferometric imaging: II. MAP estimation

Uncertainty quantification for radio interferometric imaging – I. Proximal MCMC methods

Cloud Cover Assessment for Operational Crop Monitoring Systems in Tropical Areas

Understanding Anharmonicity in fcc Materials: From its Origin to ab initio Strategies beyond the Quasiharmonic Approximation.

Statistical sampling approaches for soil monitoring

SU‐E‐J‐62: Estimating Plausible Treatment Course Dose Distributions by Accounting for Registration Uncertainty and Organ Motion

Validation of Geant4-Based Radioactive Decay Simulation

A statistical sampling approach for measurement of fracture toughness parameters in a 4330 steel by 3-D femtosecond laser-based tomography

Sampling of illicit drugs for quantitative analysis. Part I: Heterogeneity study of illicit drugs in Europe

Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

An Optimization-Based Sampling Scheme for Phylogenetic Trees

Data driven surrogate-based optimization in the problem solving environment WBCSim

INFOCARB: A regional scale forest carbon inventory (Provincia Autonoma di Trento, Southern Italian Alps)

Evaluation of photovoltaic modules based on sampling inspection using smoothed empirical quantiles

Software for generating liability distributions for pedigrees conditional on their observed disease states and covariates

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Statistical Sampling Approach Research Articles

Related Topics

Articles published on Statistical Sampling Approach

Constrained Hybrid Monte Carlo Sampling Made Simple for Chemical Reaction Simulations.

Benchmarking laser scanning and terrestrial photogrammetry to extract forest inventory parameters in a complex temperate forest

Passive sampling in reproducing kernel Hilbert spaces using leverage scores

The Early Steps of Molecule-to-Material Conversion in Chemical Vapor Deposition (CVD): A Case Study.

Considerations in characterizing real-world data relevance and quality for regulatory purposes: A commentary.

Uncertainty quantification for radio interferometric imaging: II. MAP estimation

Uncertainty quantification for radio interferometric imaging – I. Proximal MCMC methods

Cloud Cover Assessment for Operational Crop Monitoring Systems in Tropical Areas

Understanding Anharmonicity in fcc Materials: From its Origin to ab initio Strategies beyond the Quasiharmonic Approximation.

Statistical sampling approaches for soil monitoring

SU‐E‐J‐62: Estimating Plausible Treatment Course Dose Distributions by Accounting for Registration Uncertainty and Organ Motion

Validation of Geant4-Based Radioactive Decay Simulation

A statistical sampling approach for measurement of fracture toughness parameters in a 4330 steel by 3-D femtosecond laser-based tomography

Sampling of illicit drugs for quantitative analysis. Part I: Heterogeneity study of illicit drugs in Europe

Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

An Optimization-Based Sampling Scheme for Phylogenetic Trees

Data driven surrogate-based optimization in the problem solving environment WBCSim

INFOCARB: A regional scale forest carbon inventory (Provincia Autonoma di Trento, Southern Italian Alps)

Evaluation of photovoltaic modules based on sampling inspection using smoothed empirical quantiles

Software for generating liability distributions for pedigrees conditional on their observed disease states and covariates