Asymptotics for a class of parametric martingale posteriors
Summary The martingale posterior framework replaces the elicitation of the likelihood and prior with that of a sequence of one-step-ahead predictive densities for Bayesian inference. Posterior sampling then involves the imputation of unobserved quantities and can then be carried out in an expedient and parallelizable manner using predictive resampling, without requiring Markov chain Monte Carlo. Recent work has investigated the use of plug-in parametric predictive densities, combined with stochastic gradient descent, to specify a parametric martingale posterior. This paper investigates the asymptotic properties of this class of parametric martingale posteriors. In particular, two central limit theorems based on martingale limit theory are introduced and applied. The first is a predictive central limit theorem, which enables a significant acceleration of the predictive resampling scheme through a hybrid sampling algorithm based on a normal approximation. The second is a Bernstein–von Mises result, which is novel for martingale posteriors, and provides methodological guidance on attaining desirable frequentist properties. We demonstrate the utility of the theoretical results through simulations and a real data example.
- Research Article
14
- 10.1515/1544-6115.1765
- Jan 21, 2012
- Statistical Applications in Genetics and Molecular Biology
Problems involving thousands of null hypotheses have been addressed by estimating the local false discovery rate (LFDR). A previous LFDR approach to reporting point and interval estimates of an effect-size parameter uses an estimate of the prior distribution of the parameter conditional on the alternative hypothesis. That estimated prior is often unreliable, and yet strongly influences the posterior intervals and point estimates, causing the posterior intervals to differ from fixed-parameter confidence intervals, even for arbitrarily small estimates of the LFDR. That influence of the estimated prior manifests the failure of the conditional posterior intervals, given the truth of the alternative hypothesis, to match the confidence intervals. Those problems are overcome by changing the posterior distribution conditional on the alternative hypothesis from a Bayesian posterior to a confidence posterior. Unlike the Bayesian posterior, the confidence posterior equates the posterior probability that the parameter lies in a fixed interval with the coverage rate of the coinciding confidence interval. The resulting confidence-Bayes hybrid posterior supplies interval and point estimates that shrink toward the null hypothesis value. The confidence intervals tend to be much shorter than their fixed-parameter counterparts, as illustrated with gene expression data. Simulations nonetheless confirm that the shrunken confidence intervals cover the parameter more frequently than stated. Generally applicable sufficient conditions for correct coverage are given. In addition to having those frequentist properties, the hybrid posterior can also be motivated from an objective Bayesian perspective by requiring coherence with some default prior conditional on the alternative hypothesis. That requirement generates a new class of approximate posteriors that supplement Bayes factors modified for improper priors and that dampen the influence of proper priors on the credibility intervals. While that class of posteriors intersects the class of confidence-Bayes posteriors, neither class is a subset of the other. In short, two first principles generate both classes of posteriors: a coherence principle and a relevance principle. The coherence principle requires that all effect size estimates comply with the same probability distribution. The relevance principle means effect size estimates given the truth of an alternative hypothesis cannot depend on whether that truth was known prior to observing the data or whether it was learned from the data.
- Research Article
36
- 10.1137/1127030
- Jan 1, 1983
- Theory of Probability & Its Applications
On the Accuracy of Normal Approximation of the Probability of Hitting a Ball
- Research Article
- 10.1016/j.spl.2024.110194
- Jun 27, 2024
- Statistics and Probability Letters
From law of the iterated logarithm to Zolotarev distance for supercritical branching processes in random environment
- Research Article
16
- 10.1137/1127031
- Jan 1, 1983
- Theory of Probability & Its Applications
Previous article Next article Estimate of the Accuracy of Normal Approximation in Hilbert SpaceB. A. ZalesskiiB. A. Zalesskiihttps://doi.org/10.1137/1127031PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAbout[1] V. V. Sazonov, Normal approximation—some recent advances, Lecture Notes in Mathematics, Vol. 879, Springer-Verlag, Berlin, 1981vii+105 83g:60035 0462.60006 CrossrefGoogle Scholar[2] F. Götze, Asymptotic expansions for bivariate von Mises functionals, Z. Wahrsch. Verw. Gebiete, 50 (1979), 333–355 81c:60025 0405.60009 CrossrefGoogle Scholar[3] F. Götze, Convergence rate in the central limit theorem in Hilbert space, 14th European Meeting of Statisticians, Abstracts, Wroclaw, 1981, 35– Google Scholar[4] V. V. Yurinskii, On the infinite-dimensional version of S. N. Bernshtein's inequalities, Theory. Prob. Appl., 15 (1970), 108–109 LinkGoogle Scholar[5] V. V. Petrov, Sums of independent random variables, Springer-Verlag, New York, 1975x+346 52:9335 0322.60042 CrossrefGoogle Scholar Previous article Next article FiguresRelatedReferencesCited byDetails Rates of convergence for the Nummelin conditional weak law of large numbersStochastic Processes and their Applications, Vol. 98, No. 2 Cross Ref Asymptotic expansions in the integral and local limit theorems in banach spaces with applications to ?-statisticsJournal of Theoretical Probability, Vol. 6, No. 4 Cross Ref On the Accuracy of Normal Approximation of the Probability of Hitting a Ball of Sums of Weakly Dependent Hilbert Space Valued Random Variables IA. N. Tikhomirov17 July 2006 | Theory of Probability & Its Applications, Vol. 36, No. 4AbstractPDF (1136 KB)Normal Approximation in Hilbert Space. IB. A. Zalesskii, V. V. Sazonov, and V. V. Ul’yanov17 July 2006 | Theory of Probability & Its Applications, Vol. 33, No. 2AbstractPDF (1528 KB)Summary of Reports Presented at Sessions of the Seminar on Probability Theory and Mathematical Statistics at the Institute of Mathematics of the Siberian Section of the USSR Academy of Sciences, September–December 19841 August 2006 | Theory of Probability & Its Applications, Vol. 31, No. 1AbstractPDF (545 KB)On the Convergence Rate in the Infinite-Dimensional Central Limit Theorem for Probabilities of Hitting ParallelepipedsA. V. Asriev and V. I. Rotar’28 July 2006 | Theory of Probability & Its Applications, Vol. 30, No. 4AbstractPDF (730 KB)On the Convergence Rate in the Central Limit Theorem on a Class of Sets in Hilbert SpaceB. A. Zalesskii28 July 2006 | Theory of Probability & Its Applications, Vol. 30, No. 4AbstractPDF (703 KB)On the Accuracy of Normal Approximation on Sets Defined by a Smooth Function. IIT. R. Vinogradova17 July 2006 | Theory of Probability & Its Applications, Vol. 30, No. 3AbstractPDF (437 KB)On the Accuracy of Normal Approximation on Sets Defined by a Smooth Function. IT. R. Vinogradova17 July 2006 | Theory of Probability & Its Applications, Vol. 30, No. 2AbstractPDF (1000 KB)Summary of Reports Presented at Sessions of the Probability and Mathematical Statistics Seminar at the Mathematics Institute of the Siberian Section of the USSR Academy of Sciences, February–May 198317 July 2006 | Theory of Probability & Its Applications, Vol. 29, No. 2AbstractPDF (525 KB)Closeness of Moments for Normal Approximation in a Hilbert SpaceB. A. Zalesskii and V. V. Sazonov17 July 2006 | Theory of Probability & Its Applications, Vol. 28, No. 2AbstractPDF (903 KB)Expansions for von Mises functionalsZeitschrift f�r Wahrscheinlichkeitstheorie und Verwandte Gebiete, Vol. 65, No. 4 Cross Ref Bibliography Cross Ref Volume 27, Issue 2| 1983Theory of Probability & Its Applications History Submitted:22 October 1981Published online:17 July 2006 InformationCopyright © Society for Industrial and Applied MathematicsPDF Download Article & Publication DataArticle DOI:10.1137/1127031Article page range:pp. 290-298ISSN (print):0040-585XISSN (online):1095-7219Publisher:Society for Industrial and Applied Mathematics
- Research Article
5
- 10.1016/j.jmva.2014.12.003
- Dec 16, 2014
- Journal of Multivariate Analysis
Nonparametric confidence regions for the central orientation of random rotations
- Research Article
27
- 10.1287/stsy.2019.0050
- Jun 1, 2020
- Stochastic Systems
Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. The parameter updates occur in continuous time and satisfy a stochastic differential equation. This paper analyzes the asymptotic convergence rate of the SGDCT algorithm by proving a central limit theorem for strongly convex objective functions and, under slightly stronger conditions, for nonconvex objective functions as well. An [Formula: see text] convergence rate is also proven for the algorithm in the strongly convex case. The mathematical analysis lies at the intersection of stochastic analysis and statistical learning.
- Book Chapter
18
- 10.1007/978-1-4419-5780-1_10
- Jan 1, 2010
Many of the special discrete and special continuous distributions that we have discussed can be well approximated by a normal distribution for suitable configurations of their underlying parameters. Typically, the normal approximation works well when the parameter values are such that the skewness of the distribution is small. For example, binomial distributions are well approximated by a normal distribution when n is large and p is not too small or too large. Gamma distributions are well approximated by a normal distribution when the shape parameter α is large. Whenever we see a certain phenomenon empirically all too often, we might expect that there is a unifyingmathematical result there, and in this case indeed there is. The unifyingmathematical result is one of the most important results in all of mathematics and is called the central limit theorem. The subject of central limit theorems is incredibly diverse. In this chapter, we present the basic or the canonical central limit theorem and its applications to certain problems with which we are already familiar. Among numerous excellent references on central limit theorems, we recommend Feller (1968, 1971) and Pitman (1992) for lucid expositions and examples. The subject of central limit theorems also has a really interesting history; we recommend Le Cam (1986) and Stigler (1986) in this area. Careful and comprehensive mathematical treatments are available in Hall (1992) and Bhattacharya and Rao (1986). For a diverse selection of examples, see DasGupta (2008).
- Conference Article
11
- 10.1145/3097983.3098147
- Aug 13, 2017
Efficiency of large-scale learning is a hot topic in both academic and industry. The stochastic gradient descent (SGD) algorithm, and its extension mini-batch SGD, allow the model to be updated without scanning the whole data set. However, the use of approximate gradient leads to the uncertainty issue, slowing down the decreasing of objective function. Furthermore, such uncertainty may result in a high frequency of meaningless update on the model, causing a communication issue in parallel learning environment. In this work, we develop a batch-adaptive stochastic gradient descent (BA-SGD) algorithm, which can dynamically choose a proper batch size as learning proceeds. Particularly on the basis of Taylor extension and central limit theorem, it models the decrease of objective value as a Gaussian random walk game with rebound. In this game, a heuristic strategy of determining batch size is adopted to maximize the utility of each incremental sampling. By evaluation on multiple real data sets, we demonstrate that by smartly choosing the batch size, the BA-SGD not only conserves the fast convergence of SGD algorithm but also avoids too frequent model updates.
- Research Article
12
- 10.1090/tran/8459
- Jul 19, 2021
- Transactions of the American Mathematical Society
In this article, we try to give an answer to the simple question: “ What is the optimal growth rate of the dimension p p as a function of the sample size n n for which the Central Limit Theorem (CLT) holds uniformly over the collection of p p -dimensional hyper-rectangles ?” . Specifically, we are interested in the normal approximation of suitably scaled versions of the sum ∑ i = 1 n X i \sum _{i=1}^{n}X_i in R p \mathcal {R}^p uniformly over the class of hyper-rectangles A r e = { ∏ j = 1 p [ a j , b j ] ∩ R : − ∞ ≤ a j ≤ b j ≤ ∞ , j = 1 , … , p } \mathcal {A}^{re}=\{\prod _{j=1}^{p}[a_j,b_j]\cap \mathcal {R}:-\infty \leq a_j\leq b_j \leq \infty , j=1,\ldots ,p\} , where X 1 , … , X n X_1,\dots ,X_n are independent p − p- dimensional random vectors with each having independent and identically distributed (iid) components. We investigate the optimal cut-off rate of log p \log p below which the uniform CLT holds and above which it fails. According to some recent results of Chernozukov et al. [Ann. Probab. 45 (2017), pp. 2309–2352], it is well known that the CLT holds uniformly over A r e \mathcal {A}^{re} if log p = o ( n 1 / 7 ) \log p=o\big (n^{1/7}\big ) . They also conjectured that for CLT to hold uniformly over A r e \mathcal {A}^{re} , the optimal rate is log p = o ( n 1 / 3 ) \log p = o\big (n^{1/3}\big ) . We show instead that under some suitable conditions on the even moments and under vanishing odd moments, the CLT holds uniformly over A r e \mathcal {A}^{re} , when log p = o ( n 1 / 2 ) \log p=o\big (n^{1/2}\big ) . More precisely, we show that if log p = ϵ n \log p =\epsilon \sqrt {n} for some sufficiently small ϵ > 0 \epsilon >0 , the normal approximation is valid with an error ϵ \epsilon , uniformly over A r e \mathcal {A}^{re} . Further, we show by an example that the uniform CLT over A r e \mathcal {A}^{re} fails if lim sup n → ∞ n − ( 1 / 2 + δ ) log p > 0 \limsup _{ n\rightarrow \infty } n^{-(1/2+\delta )} \log p >0 for some δ > 0 \delta >0 . Therefore, with some moment conditions the optimal rate of the growth of p p for the validity of the CLT is given by log p = o ( n 1 / 2 ) \log p=o\big (n^{1/2}\big ) .
- Book Chapter
- 10.1007/978-3-319-34139-2_15
- Jan 1, 2016
In recent years, there has been much work done on high dimensional problems in both theory and applications since high dimensional data are getting more common in broad areas such as microarray data analysis. One important issue in multiple testing problems in high dimensional data is controlling the significance level of large scale simultaneous testing to select significant ones among huge number of genes. In many cases, the true null distribution is assumed to be well-known or a parametric distribution so that p-values can be easily calculated. In practice, the true null distribution may be misspecified or different from the assumed distribution. In this paper, we consider a procedure for a FDR based on extreme values which is less sensitive to inaccurate p-values. The normalized forms are assumed to be approximately a standard normal by the central limit theorem (CLT). Comparing to the CLT approximation, we show that FDR procedure with extreme values achieves a more accurate simultaneous test level under some weaker conditions on sample sizes. We provide simulation studies and a real data example to compare the performance of our proposed procedure and an existing procedure.
- Research Article
74
- 10.1137/1125089
- Jan 1, 1981
- Theory of Probability & Its Applications
Uniform Estimates of the Rate of Convergence in the Multi-Dimensional Central Limit Theorem
- Research Article
24
- 10.1137/1115072
- Jan 1, 1970
- Theory of Probability & Its Applications
A Non-Uniform Estimate for the Convergence Speed in the Multi-Dimensional Central Theorem
- Single Book
4
- 10.1002/9780470486979
- May 4, 2009
Preface. 1. Probability and Sample Spaces. Why Study Probability? Probability. Sample Spaces. Some Properties of Probabilities. Finding Probabilities of Events. Conclusions. Explorations. 2. Permutations and Combinations: Choosing the Best Candidate Acceptance Sampling. Permutations. Counting Principle. Permutations with Some Objects Alike. Permuting Only Some of the Objects. Combinations. General Addition Theorem and Applications. Conclusions. Explorations. 3. Conditional Probability. Introduction. Some Notation. Bayes' Theorem. Conclusions. Explorations. 4. Geometric Probability. Conclusion. Explorations. 5. Random Variables and Discrete Probability Distributions-Uniform, Binomial, Hypergeometric, and Geometric Distributions. Introduction. Discrete Uniform Distribution. Mean and Variance of a Discrete Random Variable. Intervals, sigma , and German Tanks. Sums. Binomial Probability Distribution. Mean and Variance of the Binomial Distribution. Sums. Hypergeometric Distribution. Other Properties of the Hypergeometric Distribution. Geometric Probability Distribution. Conclusions. Explorations. 6. Seven-Game Series in Sports. Introduction. Seven-Game Series. Winning the First Game. How Long Should the Series Last? Conclusions. Explorations. 7. Waiting Time Problems. Waiting for the First Success. The Mythical Island. Waiting for the Second Success. Waiting for the r th Success. Mean of the Negative Binomial. Collecting Cereal Box Prizes. Heads Before Tails. Waiting for Patterns. Expected Waiting Time for HH. Expected Waiting Time for TH. An Unfair Game with a Fair Coin. Three Tosses. Who Pays for Lunch? Expected Number of Lunches. Negative Hypergeometric Distribution. Mean and Variance of the Negative Hypergeometric. Negative Binomial Approximation. The Meaning of the Mean. First Occurrences. Waiting Time for c Special Items to Occur. Estimating k. Conclusions. Explorations. 8. Continuous Probability Distributions: Sums, the Normal Distribution, and the Central Limit Theorem Bivariate Random Variables. Uniform Random Variable. Sums. A Fact About Means. Normal Probability Distribution. Facts About Normal Curves. Bivariate Random Variables. Variance. Central Limit Theorem: Sums. Central Limit Theorem: Means. Central Limit Theorem. Expected Values and Bivariate Random Variables. Means and Variances of Means. A Note on the Uniform Distribution. Conclusions. Explorations. 9. Statistical Inference I. Estimation. Confidence Intervals. Hypothesis Testing. beta and the Power of a Test. p -Value for a Test. Conclusions. Explorations. 10. Statistical Inference II: Continuous Probability Distributions II-Comparing Two Samples. The Chi-Squared Distribution. Statistical Inference on the Variance. Student t Distribution. Testing the Ratio of Variances: The F Distribution. Tests on Means from Two Samples. Conclusions. Explorations. 11. Statistical Process Control. Control Charts. Estimating sigma Using the Sample Standard Deviations. Estimating sigma Using the Sample Ranges. Control Charts for Attributes. np Control Chart. p Chart. Some Characteristics of Control Charts. Some Additional Tests for Control Charts. Conclusions. Explorations. 12. Nonparametric Methods. Introduction. The Rank Sum Test. Order Statistics. Median. Maximum. Runs. Some Theory of Runs. Conclusions. Explorations. 13. Least Squares, Medians, and the Indy 500. Introduction. Least Squares. Principle of Least Squares. Influential Observations. The Indy 500. A Test for Linearity: The Analysis of Variance. A Caution. Nonlinear Models. The Median-Median Line. When Are the Lines Identical? Determining the Median-Median Line. Analysis for Years 1911-1969. Conclusions. Explorations. 14. Sampling. Simple Random Sampling. Stratification. Proportional Allocation. Optimal Allocation. Some Practical Considerations. Strata. Conclusions. Explorations. 15. Design of Experiments. Yates Algorithm. Randomization and Some Notation. Confounding. Multiple Observations. Design Models and Multiple Regression Models. Testing the Effects for Significance. Conclusions. Explorations. 16. Recursions and Probability. Introduction. Conclusions. Explorations. 17. Generating Functions and the Central Limit Theorem. Means and Variances. A Normal Approximation. Conclusions. Explorations. Bibliography. Where to Learn More. Index.
- Research Article
5
- 10.1080/03610918.2021.2012192
- Nov 29, 2021
- Communications in Statistics - Simulation and Computation
The a priori procedure (APP) is concerned with determining appropriate sample sizes to ensure that sample statistics to be obtained are likely to be good estimates of corresponding population parameters. Previous APP work pertaining to proportions has used the normal approximation to the binomial distribution, but this is problematic when the population proportion is near zero or one. The present contribution addresses the issue in four ways. First, we add a skew normal approximation that does a better job than the normal approximation. Second, we add a Bayesian component making use of a prior beta distribution that is conjugate to the binomial distribution. Third, we provide simulations and real data examples, one of them is a set of Covid-19 data. Finally, we include free and user-friendly computer programs to aid researchers in making the calculations.
- Research Article
6
- 10.1002/cjs.10075
- Aug 31, 2010
- Canadian Journal of Statistics
In this article the author investigates the application of the empirical‐likelihood‐based inference for the parameters of varying‐coefficient single‐index model (VCSIM). Unlike the usual cases, if there is no bias correction the asymptotic distribution of the empirical likelihood ratio cannot achieve the standard chi‐squared distribution. To this end, a bias‐corrected empirical likelihood method is employed to construct the confidence regions (intervals) of regression parameters, which have two advantages, compared with those based on normal approximation, that is, (1) they do not impose prior constraints on the shape of the regions; (2) they do not require the construction of a pivotal quantity and the regions are range preserving and transformation respecting. A simulation study is undertaken to compare the empirical likelihood with the normal approximation in terms of coverage accuracies and average areas/lengths of confidence regions/intervals. A real data example is given to illustrate the proposed approach.The Canadian Journal of Statistics38: 434–452; 2010 © 2010 Statistical Society of Canada