Conditioning on posterior samples for flexible frequentist goodness-of-fit testing
Summary Tests of goodness of fit are used in nearly every domain where statistics is applied.One powerful and flexible approach is to sample artificial data sets that are exchangeable with the real data under the null hypothesis (but not under the alternative), as this allows the analyst to conduct a valid test using any test statistic they desire. Such sampling is typically done by conditioning on either an exact or approximate sufficient statistic, but existing methods for doing so have significant limitations, which either preclude their use or substantially reduce their power or computational tractability for many important models. In this paper, we propose to condition on samples from a Bayesian posterior distribution, which constitute a very different type of approximate sufficient statistic than those considered in prior work. Our approach, approximately co-sufficient sampling via Bayes , considerably expands the scope of this flexible type of goodness-of-fit testing. We prove the approximate validity of the resulting test, and demonstrate its utility on three common null models where no existing methods apply, as well as its outperformance on models where existing methods do apply.
- Supplementary Content
- 10.25440/smu.12310127.v1
- Jul 1, 2018
- Figshare
My dissertation consists of three essays which contribute new theoretical results to Bayesian econometrics. Chapter 2 proposes a new Bayesian test statistic to test a point null hypothesis based on a quadratic loss. The proposed test statistic may be regarded as the Bayesian version of the Lagrange multiplier test. Its asymptotic distribution is obtained based on a set of regular conditions and follows a chi-squared distribution when the null hypothesis is correct. The new statistic has several important advantages that make it appealing in practical applications. First, it is well-defined under improper prior distributions. Second, it avoids Jeffrey-Lindley’s paradox. Third, it always takes a non-negative value and is relatively easy to compute, even for models with latent variables. Fourth, its numerical standard error is relatively easy to obtain. Finally, it is asymptotically pivotal and its threshold values can be obtained from the chi-squared distribution. Chapter 3 proposes a new Wald-type statistic for hypothesis testing based on Bayesian posterior distributions. The new statistic can be explained as a posterior version of Wald test and have several nice properties. First, it is well-defined under improper prior distributions. Second, it avoids Jeffreys-Lindley’s paradox. Third, under the null hypothesis and repeated sampling, it follows a c2 distribution asymptotically, offering an asymptotically pivotal test. Fourth, it only requires inverting the posterior covariance for the parameters of interest. Fifth and perhaps most importantly, when a random sample from the posterior distribution (such as an MCMC output) is available, the proposed statistic can be easily obtained as a by-product of posterior simulation. In addition, the numerical standard error of the estimated proposed statistic can be computed based on the random sample. The finite-sample performance of the statistic is examined in Monte Carlo studies. Chapter 4 proposes a quasi-Bayesian approach for structural parameters in finitehorizon life-cycle models. This approach circumvents the numerical evaluation of the gradient of the objective function and alleviates the local optimum problem. The asymptotic normality of the estimators with and without approximation errors is derived. The proposed estimators reach the efficiency bound in the general methods of moment (GMM) framework. Both the estimators and the corresponding asymptotic covariance are readily computable. The estimation procedure is easy to parallel so that the graphic processing unit (GPU) can be used to enhance the computational speed. The estimation procedure is illustrated using a variant of the model in Gourinchas and Parker (2002). The dissertation comprises 3 papers, available from: 1. A Bayesian chi-squared test for hypothesis testing (2015) Journal of Econometrics, 189 (1), 54-69. 2. A posterior-based Wald-type statistic for hypothesis testing (2018) working paper 3. Estimating Finite-Horizon Life-Cycle Models: A Quasi-Bayesian Approach (2017) working paper
- Research Article
5
- 10.1016/j.bjae.2019.03.006
- May 14, 2019
- BJA Education
Hypothesis tests
- Research Article
13
- 10.1016/j.jeconom.2021.11.003
- Nov 29, 2021
- Journal of Econometrics
Posterior-based Wald-type statistics for hypothesis testing
- Research Article
4
- 10.2139/ssrn.3184330
- Jan 1, 2018
- SSRN Electronic Journal
A Posterior-Based Wald-Type Statistic for Hypothesis Testing
- Research Article
24
- 10.1016/j.jspi.2013.08.011
- Aug 22, 2013
- Journal of Statistical Planning and Inference
A prior-free framework of coherent inference and its derivation of simple shrinkage estimators
- Research Article
6
- 10.1016/j.ajhg.2009.10.006
- Nov 1, 2009
- The American Journal of Human Genetics
ATRIUM: Testing Untyped SNPs in Case-Control Association Studies with Related Individuals
- Research Article
12
- 10.1080/03610926.2020.1790004
- Jul 10, 2020
- Communications in Statistics - Theory and Methods
While empirical Bayes methods thrive in the presence of the thousands of simultaneous hypothesis tests in genomics and other large-scale applications, significance tests and confidence intervals are considered more appropriate for small numbers of tested hypotheses. Indeed, for fewer hypotheses, there is more uncertainty in empirical Bayes estimates of the prior distribution. Confidence intervals have been used to propagate the uncertainty in the prior to empirical Bayes inference about a parameter, but only by combining a Bayesian posterior distribution with a confidence distribution. Combining distributions of both types has also been used to combine empirical Bayes methods and confidence intervals for estimating a parameter of interest. To clarify the foundational status of such combinations, the concept of an evidential model is proposed. In the framework of evidential models, both Bayesian posterior distributions and confidence distributions are special cases of evidential support distributions. Evidential support distributions, by quantifying the sufficiency of the data as evidence, leverage the strengths of Bayesian posterior distributions and confidence distributions for cases in which each type performs well and for cases benefiting from the combination of both. Evidential support distributions also address problems of bioequivalence, bounded parameters, and the lack of a unique confidence distribution.
- Research Article
- 10.1080/03610926.2020.1828465
- Oct 5, 2020
- Communications in Statistics - Theory and Methods
We propose a frequentist testing procedure that maintains a defined coverage and is optimal in the sense that it gives maximal power to detect deviations from a null hypothesis when the alternative to the null hypothesis is sampled from a pre-specified distribution (the prior distribution). Selecting a prior distribution allows to tune the decision rule. This leads to an increased power, if the true data generating distribution happens to be compatible with the prior. It comes at the cost of losing power, if the data generating distribution or the observed data are incompatible with the prior. We illustrate the proposed approach for a binomial experiment, which is sufficiently simple such that the decision sets can be illustrated in figures, which should facilitate an intuitive understanding. The potential beyond the simple example will be discussed: the approach is generic in that the test is defined based on the likelihood function and the prior only. It is comparatively simple to implement and efficient to execute, since it does not rely on Minimax optimization. Conceptually it is interesting to note that for constructing the testing procedure the Bayesian posterior probability distribution is used.
- Research Article
44
- 10.1037/met0000248
- Oct 1, 2020
- Psychological Methods
The p value is still misinterpreted as the probability that the null hypothesis is true. Even psychologists who correctly understand that p values do not provide this probability may not realize the degree to which p values differ from the probability that the null hypothesis is true. Importantly, previous research on this topic has not addressed the influence of multiple testing, often a reality in psychological studies, and has not extensively considered the influence of different prior probabilities favoring the null and alternative hypotheses. Simulation studies are presented that emphasize the magnitude by which p values are distinct from the posterior probability that the null hypothesis is true, under an extensive set of conditions including multiple testing. Particular emphasis is placed on p values just under .05, given the prevalence of these p values in the published literature, though p values in other intervals are also assessed. In diverse conditions, results indicate that posterior probabilities favoring the null hypothesis are often far removed from .05, and this pattern quickly gets much worse when multiple testing is conducted. Rather than simply telling researchers that p values do not reflect the probability favoring the null hypothesis, as has been done previously, the results presented here allow psychologists to see the evidence provided by various p values. These results have particularly topical implications for the replication crisis, for how much weight should be placed on a single study, and for how the term statistical significance should be interpreted, particularly in conditions typical in psychological research. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
- Research Article
7
- 10.1002/bimj.201700209
- Sep 2, 2018
- Biometrical Journal
A multistage single arm phase II trial with binary endpoint is considered. Bayesian posterior probabilities are used to monitor futility in interim analyses and efficacy in the final analysis. For a beta-binomial model, decision rules based on Bayesian posterior probabilities are converted to "traditional" decision rules in terms of number of responders among patients observed so far. Analytical derivations are given for the probability of stopping for futility and for the probability to declare efficacy. A workflow is presented on how to select the parameters specifying the Bayesian design, and the operating characteristics of the design are investigated. It is outlined how the presented approach can be transferred to statistical models other than the beta-binomial model.
- Research Article
1
- 10.2174/1874297100801010048
- Oct 30, 2008
- The Open Epidemiology Journal
Two basic formulas, for the mean and variance of the number of effects in an epidemiological cohort, are de- rived. The formula for variance shows extra-binomial variation or overdispersion when there is correlated uncertainty of the probability of an effect. The formulas were validated by a numerical Monte Carlo study. The method of including epistemic uncertainty discussed by Hofer (E. Hofer, Health Physics, 2007) is generalized to include separately uncer- tainty from a Bayesian posterior distribution when the prior is known, and uncertainty of the prior. In this note, two basic formulas, for the mean and vari- ance of the number of effects in an epidemiological cohort, are derived. It is conventionally assumed that the number of effects has either a Poisson or binomial distribution (1). The formula for the variance given here reduces to the binomial result in the case of no correlations of the probability of an effect, but shows extra-binomial variation or overdisper- sion when there are correlations. In practice, the variance could be calculated using the method advocated by Hofer (2), where, for each individual in an epidemiological cohort, some number j = 1…M of al- ternate realizations of the dose and hence the probability of an effect, taking into account possible correlations, are gen- erated using Monte Carlo. This method is generalized here to include separately uncertainty from a Bayesian posterior distribution when the prior is known, and uncertainties caused by lack of knowledge of the prior. This is because, within a linear dose-effect response model, the average num- ber of effects is proportional to the posterior-average- collec- tive dose, and the important uncertainty is that of the poste- rior-average-collective dose caused by lack of knowledge of the prior.
- Conference Article
- 10.3997/2214-4609.201409905
- Jan 1, 1994
- 56th EAEG Meeting
A key problem in Reservoir Characterization concerns the description and visualization of reservoir heterogeneities as represented by properties such as porosity, permeability, etc. The inherent nonuniqueness associated with this problem has prompted considerable interest in the development and application of Stochastic Imaging techniques. Such techniques are designed to generate a family of equiprobable images of these properties - each image being consistent with all the available quantitative (well-logs, cores, seismic derived constraint intervals, well-test information) and qualitative (geological interpretations) information, and its spatial correlation characteristics. These stochastic images may be viewed as samples from approximations to the "optimal" Bayesian posterior distribution on the unknown properties. Statistical analysis of the "spread" of this distribution allows for the quantification of risk/uncertainty associated with the spatial variability of these properties. Such distributions can be used in dynamic flow simulations and statistical decision theoretic techniques for optimal forecasting and management of the reservoir.
- Conference Article
2
- 10.2118/24741-ms
- Oct 4, 1992
One of the key problems in Reservoir Characterization involves the description and visualization of reservoir heterogeneities (as represented by the spatial variability of properties such as porosity, permeability, thickness, lithofacies types, fracture / fault orientations, sand body geometry, etc). The inherent nonuniqueness associated with this problem has prompted considerable interest in the development and application of stochastic (ie. probabilistic) imaging techniques. Such techniques are designed to generate a family of equiprobable descriptions or stochastic images of these parameters - each image being consistent with all the available quantitative (well-logs, cores, seismic derived constraint intervals, well-test information) and qualitative (geological interpretation) information, and its spatial correlation characteristics. These stochastic images may be viewed as samples from approximations to the "optimal’’ Bayesian posterior distribution on the reservoir parameters. Statistical analysis of the "spread" of this distribution allows for the quantification of risk/uncertainty associated with the spatial variability of these parameters. Also, selected subsets of stochastic images may be "passed" through dynamic flow simulators to assess the distribution of significant production response variables. Such distributions can be incorporated with statistical decision theoretic techniques in order to aid in the optimal forecasting and management of the reservoir. A number of stochastic imaging techniques have been developed in the past few years - indeed this number is rapidly growing. The object of this paper is to assess the current state of the art in stochastic imaging techniques for reservoir characterization, along with associated statistical methodologies for integrating seismic data, and for reservoir performance forecasting and management. The following paragraphs provide an overview of the body of the paper. Section II of the paper provides a comparitive review of stochastic imaging techniques. The reviewed set covers both discrete and continuous single/multivariable methods - these include Boolean algorithms and Marked Point Processes, Indicator methods, (truncated) Gaussian Random Functions. Fractal fields, Simulated Annealing, Markov Random Fields and direct Bayesian Imaging algorithms. Particular attention is given to underlying assumptions* data integration, internal consistency, performance (eg. exactitude, reproduction of spatial correlation structure, quality of approximation to the Bayesian posterior), computational and inferential complexity, and practical limitations. Also, the techniques are compared with respect to their capabilities for incorporating "soft" information (such as inequality constraints), handling anisotropy and trends (ie.lst order non-stationarities), and for imaging vector variables. The potential of seismic data for adding detail to reservoir descriptions "between the wells", is now generally acknowledged. Section III reviews known techniques for integrating seismic data in reservoir descriptions. This includes recent developments in techniques such as External drift, Cokriging, Markov Random Fields, M.A.P. algorithms, Bayesian (Hard/Soft) Inversion, Markov_Bayes algorithms, and ID Stochastic Inversion. Brief descriptions are also provided of methods for conditioning the stochastic images to physics-based "forward models", and to qualitative geological information. Section IV summarizes current techniques for utilizing the stochastic images in performance forecasting and reservoir management. This review emphasizes the use of statistical decision theoretic approaches. Also, current progress in conditioning stochastic reservoir models to production / well-test information is summarized. In section V, illustrative test results are presented of the application of these techniques to both synthetic and where available, "real" reservoir data sets. Reservoir description applications of hybrid multistep approaches are also summarized - here multiple stochastic imaging algorithms are applied in sequence to compute progressively more detailed descriptions. Section VI presents general guidelines for the use of stochastic imaging techniques on specific reservoir characterization problems. The paper concludes with an overview of open problems and current research directions in this field. These include (computationally feasible) multivariable stochastic imaging, incorporation of seismic information, visualization, utilization of stochastic images in dynamic flow simulations, and decision theoretic techniques for reservoir performance forecasting and management.
- Research Article
3
- 10.5085/0898-5510-21.1.55
- Dec 1, 2009
- Journal of Forensic Economics
A recent article in this journal by Tabak (2006) highlighted a potentially serious source of bias that can arise when multiple statistical tests of a damages theory are performed and even one test rejecting the null hypothesis is regarded as supporting the damages theory. Repeated testing will eventually produce a “false discovery,” that is, rejection of the null hypothesis in favor of the alternative hypothesis when the null hypothesis is true, which statisticians refer to as type I error. Consequently, performing multiple tests without adjusting the critical value can be problematical because it can lead to improperly accepting statistical evidence that apparently supports rejection of the null hypothesis as reliable when it is not. Tabak (2006) recommends making the Sidak (1968, 1971) multiple-comparison adjustment to the standard statistical t-test to correct for the false-discovery bias inherent in multiple-comparison testing. In particular, he recommends making this adjustment when performing 10b-5 securities fraud event studies when more than one corrective disclosure date is involved. This article clarifies the circumstances in which a multiple-comparison adjustment is appropriate and explains why the correction is normally not needed in securities fraud event-study testing. More generally, I explain why it is not required when each of several tests is performed and its results are reported separately, as for example, where the objective is simply to test the statistical significance of the abnormal stock return on each day on which a new and distinct curative disclosure occurred. I show that the Sidak multiple-comparison adjustment is nearly as stringent as the classical Bonferroni procedure (Simes, 1986), which can increase the risk of type II error. This article discusses a more powerful alternative to the Sidak adjustment due to Benjamini and Hochberg (1993), which directly corrects for the false-discovery bias in multiple-comparison testing and reduces the risk of type II error. II. Application of Multiple-Comparison Adjustments to Securities Fraud Event Studies Multiple-comparison-false-discovery bias can arise when (a) several statistical tests are performed on subsets of the same larger data set in an effort to
- Research Article
144
- 10.1088/0266-5611/29/8/085010
- Jul 25, 2013
- Inverse Problems
The Bayesian approach to inverse problems, in which the posterior probability distribution on an unknown field is sampled for the purposes of computing posterior expectations of quantities of interest, is starting to become computationally feasible for partial differential equation (PDE) inverse problems. Balancing the sources of error arising from finite-dimensional approximation of the unknown field, the PDE forward solution map and the sampling of the probability space under the posterior distribution are essential for the design of efficient computational Bayesian methods for PDE inverse problems. We study Bayesian inversion for a model elliptic PDE with an unknown diffusion coefficient. We provide complexity analyses of several Markov chain Monte Carlo (MCMC) methods for the efficient numerical evaluation of expectations under the Bayesian posterior distribution, given data δ. Particular attention is given to bounds on the overall work required to achieve a prescribed error level ε. Specifically, we first bound the computational complexity of ‘plain’ MCMC, based on combining MCMC sampling with linear complexity multi-level solvers for elliptic PDE. Our (new) work versus accuracy bounds show that the complexity of this approach can be quite prohibitive. Two strategies for reducing the computational complexity are then proposed and analyzed: first, a sparse, parametric and deterministic generalized polynomial chaos (gpc) ‘surrogate’ representation of the forward response map of the PDE over the entire parameter space, and, second, a novel multi-level Markov chain Monte Carlo strategy which utilizes sampling from a multi-level discretization of the posterior and the forward PDE. For both of these strategies, we derive asymptotic bounds on work versus accuracy, and hence asymptotic bounds on the computational complexity of the algorithms. In particular, we provide sufficient conditions on the regularity of the unknown coefficients of the PDE and on the approximation methods used, in order for the accelerations of MCMC resulting from these strategies to lead to complexity reductions over ‘plain’ MCMC algorithms for the Bayesian inversion of PDEs.