Enhanced upper confidence limits via randomized tests in random sampling without replacement

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract In this paper we study one-sided hypothesis testing under random sampling without replacement, which frequently appears in the cryptographic problem setting, including the verification of measurement-based quantum computation. Suppose that $n+1$ binary random variables $X_1,\ldots, X_{n+1}$ follow a permutation invariant distribution and n binary random variables $X_1,\ldots, X_{n}$ are observed. Then, we propose randomized tests with a randomization parameter for the expectation of the $(n+1)$ th random variable $X_{n+1}$ under a given significance level $\delta>0$ . Our randomized tests significantly improve the upper confidence limit over deterministic tests. Our problem setting commonly appears in machine learning in addition to cryptographic scenarios by considering adversarial examples. Such studies are essential for expanding the applicable area of statistics. Although this paper addresses only binary random variables, a similar significant improvement by randomized tests can be expected for general non-binary random variables.

Similar Papers
  • Research Article
  • Cite Count Icon 86
  • 10.1093/biomet/58.1.129
The analysis of several 2× 2 contingency tables
  • Jan 1, 1971
  • Biometrika
  • M Zelen

SUMMARY Consider data arranged into k 2 x 2 contingency tables. The principal result is the derivation of a statistical test for making an inference on whether each of the k contingency tables has the same relative risk. The test is based on a conditional reference set and can be regarded as an extension of the Fisher-Irwin treatment of a single 2 x 2 contingency table. Both exact and asymptotic procedures are presented. The analysis of k 2 x 2 contingency tables is required in several contexts. The two principal ones are (i) the comparison of binary response random variables, i.e. random variables taking on the values zero or one, for two treatments, over a spectrum of different conditions or populations; and (ii) the comparison of the degree of association among two binary random variables over k different populations. Cochran (1954) has investigated this problem with respect to testing if the success probability for each of two treatments is the same for every contingency table. Cochran's recommendation is that the equality of the two success probabilities should be tested using the total number, summed over all tables, of successes for one of the treatments. Cochran considers the asymptotic distribution of the total number of successes, for one of the treatments, conditional on all marginals being fixed in every table. He recommends this technique whenever the difference between the two populations on a logistic or probit scale is nearly constant for each contingency table. The constant logistic difference is equivalent to the relative risk being equal for each table. Mantel & Haenlszel (1959), in an important paper discussing retrospective studies, have also proposed an asymptotic method for analysing several 2 x 2 contingency tables. Their worlk on this problem was evidently done independently of Cochran, for their method is exactly the same as Cochran's except for a modification dealing with the correction factor associated with a finite population. Birch (1964) and Cox (1966) clarified the problem by showing, that under the assumption of constant logistic differences for each table, same relative risk, the conditional distribution of the total number of successes, for one of the treatments, leads to a uniformly most powerful unbiased test. Birch and Cox also derived the exact probability distribution of this conditional random variable under the given model. In this paper, we investigate the more general situation where the difference between the logits in each table is not necessarily constant. Procedures are derived for making an inference with regard to the hypothesis of constant logistic differences. Both the exact and asymptotic distributions are derived for the null and nonnull cases. This problem has been discussed by several investigators. A constant logistic difference corresponds to no interaction between the treatments and the k populations. The case k = 2 corresponds to one in which Bartlett (1935) has derived both an exact and an asymptotic procedure. Norton (1945)

  • Research Article
  • Cite Count Icon 6
  • 10.2307/2532798
An Extension of Yule's Q to Multivariate Binary Data
  • Sep 1, 1994
  • Biometrics
  • Stuart R Lipsitz + 1 more

In this note we describe a summary measure of pairwise association for multivariate binary data based on the conditional odds ratio. The proposed measure is an extension of Yule's Q to more than two binary random variables. Unlike marginal measures of association, this measure is not constrained by the marginal probabilities of success. For example, when each binary variable has a different probability of success, the upper limit of the pairwise marginal correlation coefficient is constrained to be less than 1. If one prefers a measure of association that is unconstrained, then with only two binary variables, Bishop, Feinberg, and Holland (1975, Discrete Multivariate Analysis: Theory and Practice, Cambridge, Massachusetts: MIT Press) suggest the use of the odds ratio or, equivalently, Yule's Q. Yule's Q transforms the odds ratio between the two binary variables from [0, infinity) to [-1, 1]. We propose an extension of Yule's Q to more than two binary random variables. This measure of pairwise association is based on the conditional odds ratio from a log-linear model.

  • Research Article
  • 10.1016/j.jkss.2014.12.004
Type II combination questionnaire model: A new survey design for a totally sensitive binary variable correlated with another nonsensitive binary variable
  • Jan 26, 2015
  • Journal of the Korean Statistical Society
  • Xifen Huang + 3 more

Type II combination questionnaire model: A new survey design for a totally sensitive binary variable correlated with another nonsensitive binary variable

  • Research Article
  • 10.15622/sp.14.10
A Bayesian belief network directed cycle with multinomial random variables
  • Mar 17, 2014
  • SPIIRAS Proceedings
  • Nataliya Alexandrona Valtman + 1 more

The paper generalizes the transformation of a directed cycle in Bayesian belief networks (BBN) with binary random variables into a knowledge patterns chain in algebraical Bayesian networks (ABN) for the case of multivariate random variables. Under the assumption that multivariate random variables are represented with binary random variables conjuncts, the generalized transformation consists of the same steps as the original one. First, we form stochastic matrices that correspond to conditional probability tensors in the cycle nodes. Then we calculate the product of the matrices and find out the stochastic eigen-vector of the product result. The eigen-vector represents the probabilistic distribution of cycle node random variable assignments. Later on, this distribution is used in calculations of joint distributions for random variables assignments in couples of neighboring nodes. Finally, an ABN knowledge patterns cycle is constructed with the set of latter joint distributions, and then an ABN knowledge pattern chain is constructed with the latter cycle. The method for the chain reconciliation is known.

  • Research Article
  • 10.1088/1402-4896/adfb07
A canonical probability distribution for a mixture of continuous and binary random variables
  • Nov 1, 2025
  • Physica Scripta
  • Takashi Arai

We propose a multivariate probability distribution that models a linear correlation between continuous and binary variables. The proposed distribution is a natural extension of the previously developed multivariate binary distribution known as the Grassmann distribution. The Grassmann distribution has desirable theoretical properties similar to the multivariate normal distribution, and is parametrized by a $P_0$-matrix that is necessary to ensure the model probabilities to be nonnegative. By using the property of the $P_0$-matrix, we successfully introduce interactions between continuous and binary variables while ensuring that all joint probabilities are nonnegative. We refer to the proposed distribution as canonical in the sense that it is mathematically simple and natural. Using artificial data, we numerically validate the representational capabilities of the proposed model. We further investigate the sampling distribution of the maximum likelihood estimator and empirically observe the consistency of the maximum likelihood estimator. We also construct statistical machine learning methods for classification and clustering using the proposed distribution and demonstrate the usefulness of this distribution.

  • Research Article
  • 10.2139/ssrn.3061783
Stochastic Programs with Binary Distributions: Structural Properties of Scenario Trees and Algorithms
  • Jan 1, 2017
  • SSRN Electronic Journal
  • Vit Prochazka + 1 more

Binary random variables often refer to such as customers that are present or not, roads that are open or not, machines that are operable or not. At the same time, stochastic programs often apply to situations where penalties are accumulated when demand is not met, travel times are too long, or profits too low. Typical for these situations is that the penalties imply a partition of the scenarios into two sets: Those that can result in penalties for some decisions, and those that never lead to penalties. We demonstrate how this observation can be used to efficiently calculate out-of-sample values, find good scenario trees and generally simplify calculations. Most of our observations apply to general integer random variables, and not just the 0/1 case.

  • Research Article
  • Cite Count Icon 16
  • 10.1007/s10287-018-0312-2
Stochastic programs with binary distributions: structural properties of scenario trees and algorithms
  • May 19, 2018
  • Computational Management Science
  • Vit Prochazka + 1 more

Binary random variables often refer to such as customers that are present or not, roads that are open or not, machines that are operable or not. At the same time, stochastic programs often apply to situations where penalties are accumulated when demand is not met, travel times are too long, or profits too low. Typical for these situations is that the penalties imply a partial order on the scenarios, leading to a partition of the scenarios into two sets: those that can result in penalties for some decisions, and those that never lead to penalties. We demonstrate how this observation can be used to efficiently calculate out-of-sample values, find good scenario trees and generally simplify calculations. Most of our observations apply to general integer random variables, and not just the 0/1 case.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/acc.2010.5530967
On generating sets of binary random variables with specified first- and second- moments
  • Jun 1, 2010
  • Mengran Xue + 3 more

We study the problem of generating sets of binary random variables with specified means and pairwise correlations (i.e., specified individual- and pairwise-joint- probabilities). We propose a low-complexity algorithm for generating such correlated random variables, that involves first generating a set of mutually independent “source” binary random variables and then constructing the desired random variables by randomly selecting from and probabilistically copying or anticopying the source variables. We show that the parameters of this data-generation algorithm can be easily designed to achieve the desired statistics, under broad conditions.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1155/2013/827048
A Survey Design for a Sensitive Binary Variable Correlated with Another Nonsensitive Binary Variable
  • Jan 1, 2013
  • Journal of Probability and Statistics
  • Jun-Wu Yu + 2 more

Tian et al. (2007) introduced a so-called hidden sensitivity model for evaluating the association of two sensitive questions with binary outcomes. However, in practice, we sometimes need to assess the association between one sensitive binary variable (e.g., whether or not a drug user, the number of sex partner being⩽1 or >1, and so on) and one nonsensitive binary variable (e.g., good or poor health status, with or without cervical cancer, and so on). To address this issue, by sufficiently utilizing the information contained in the non-sensitive binary variable, in this paper, we propose a new survey scheme, called combination questionnaire design/model, which consists of a main questionnaire and a supplemental questionnaire. The introduction of the supplemental questionnaire which is indeed a design of direct questioning can effectively reduce the noncompliance behavior since more respondents will not be faced with the sensitive question. Likelihood-based inferences including maximum likelihood estimates via the expectation-maximization algorithm, asymptotic confidence intervals, and bootstrap confidence intervals of parameters of interest are derived. A likelihood ratio test is provided to test the association between the two binary random variables. Bayesian inferences are also discussed. Simulation studies are performed, and a cervical cancer data set in Atlanta is used to illustrate the proposed methods.

  • Research Article
  • Cite Count Icon 28
  • 10.1016/j.pmrj.2017.02.001
Randomization Test: An Alternative Analysis for the Difference of Two Means
  • Feb 16, 2017
  • PM&R
  • Regina L Nuzzo

Randomization Test: An Alternative Analysis for the Difference of Two Means

  • Research Article
  • Cite Count Icon 103
  • 10.1109/jproc.2002.1015000
Information sources using chaotic dynamics
  • May 1, 2002
  • Proceedings of the IEEE
  • T Kohda

A sequence of binary random variables has found significant applications in modem digital communication systems. For such sequences, several kinds of linear feedback shift register sequences have been proposed. It is, however, well known in probability theory that the Bernoulli shift is a fundamental theoretic model of a sequence of independent identically distributed (i.i.d.) binary random variables. In this paper after reviewing fundamental subjects of chaotic dynamics, in particular a close relationship between information sources and Markov chains, we give the generation method of sequences of i.i.d. binary random variables using chaotic dynamics. Such a generation method is given as a sufficient condition composed of simple symmetric properties for some class of ergodic maps. Furthermore, we give the applications of such sequences: (1) to running-key sequences for stream cipher systems and (2) to a color image communication system through code-division multiple access channels and its extended version, a digital watermarking system. In addition, the performance of spread spectrum codes generated by a Markov chain is theoretically evaluated in asynchronous direct-sequence/code-division multiple access systems.

  • Research Article
  • Cite Count Icon 552
  • 10.1137/0222053
Small-Bias Probability Spaces: Efficient Constructions and Applications
  • Aug 1, 1993
  • SIAM Journal on Computing
  • Joseph Naor + 1 more

It is shown how to efficiently construct a small probability space on n binary random variables such that for every subset, its parity is either zero or one with “almost” equal probability. They are called $\epsilon $-biased random variables. The number of random bits needed to generate the random variables is $O(\log n + \log \frac{1}{\epsilon })$. Thus, if $\epsilon $ is polynomially small, then the size of the sample space is also polynomial. Random variables that are $\epsilon $-biased can be used to construct “almost” k-wise independent random variables where $\epsilon $ is a function of k. These probability spaces have various applications: l. Derandomization of algorithms: Many randomized algorithms that require only k-wise independence of their random bits (where k is bounded by $O(\log n)$), can be derandomized by using $\epsilon $-biased random variables. 2. Reducing the number of random bits required by certain randomized algorithms, e.g., verification of matrix multiplication. 3. Exhaustive testing of combinatorial circuits. The smallest known family for such testing is provided. 4. Communication complexity: Two parties can verify equality of strings with high probability exchanging only a logarithmic number of bits. 5. Hash functions: A polynomial sized family of hash functions such that with high probability the sum of a random function over two different sets is not equal can be constructed.

  • Conference Article
  • Cite Count Icon 164
  • 10.1145/100216.100244
Small-bias probability spaces: efficient constructions and applications
  • Jan 1, 1990
  • J Naor + 1 more

It is shown how to efficiently construct a small probability space on n binary random variables such that for every subset, its parity is either zero or one with “almost” equal probability. They are called $\epsilon $-biased random variables. The number of random bits needed to generate the random variables is $O(\log n + \log \frac{1}{\epsilon })$. Thus, if $\epsilon $ is polynomially small, then the size of the sample space is also polynomial. Random variables that are $\epsilon $-biased can be used to construct “almost” k-wise independent random variables where $\epsilon $ is a function of k.These probability spaces have various applications: l. Derandomization of algorithms: Many randomized algorithms that require only k-wise independence of their random bits (where k is bounded by $O(\log n)$), can be derandomized by using $\epsilon $-biased random variables. 2. Reducing the number of random bits required by certain randomized algorithms, e.g., verification of matrix multiplication. 3. Exhaustive tes...

  • Research Article
  • 10.5555/1231159.1231180
A Property of Independency Relations Induced by Probabilistic Distributions with Binary Variables
  • Jul 1, 2006
  • Fundamenta Informaticae
  • Pazazaria

The relationship between graphoid independency relations (defined in the text) and such relations induced by Probabilistic Distributions (PD) with binary random variables is investigated. It is sho...

  • Conference Article
  • Cite Count Icon 7
  • 10.23919/eusipco.2018.8553490
Robust Expectation Propagation in Factor Graphs Involving Both Continuous and Binary Variables
  • Sep 1, 2018
  • Marco Cox + 1 more

Factor graphs provide a convenient framework for automatically generating (approximate) Bayesian inference algorithms based on message passing. Examples include the sum-product algorithm (belief propagation), expectation maximization (EM), expectation propagation (EP) and variational message passing (VMP). While these message passing algorithms can be generated automatically, they depend on a library of precomputed message update rules. As a result, the applicability of the factor graph approach depends on the availability of such rules for all involved nodes. This paper describes the probit factor node for linking continuous and binary random variables in a factor graph. We derive (approximate) sum-product message update rules for this node through constrained moment matching, which leads to a robust version of the EP algorithm in which all messages are guaranteed to be proper. This enables automatic Bayesian inference in probabilistic models that involve both continuous and discrete latent variables, without the need for model-specific derivations. The usefulness of the node as a factor graph building block is demonstrated by applying it to perform Bayesian inference in a linear classification model with corrupted class labels.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon