Abstract

Consider an experiment in which $p$ independent populations $\pi_{i}$ with corresponding unknown means $\theta_{i}$ are available, and suppose that for every $1\leq i\leq p$, we can obtain a sample $X_{i1},\ldots,X_{in}$ from $\pi_{i}$. In this context, researchers are sometimes interested in selecting the populations that yield the largest sample means as a result of the experiment, and then estimate the corresponding population means $\theta_{i}$. In this paper, we present a frequentist approach to the problem and discuss how to construct simultaneous confidence intervals for the means of the $k$ selected populations, assuming that the populations $\pi_{i}$ are independent and normally distributed with a common variance $\sigma^{2}$. The method, based on the minimization of the coverage probability, obtains confidence intervals that attain the nominal coverage probability for any $p$ and $k$, taking into account the selection procedure.

Highlights

  • Given a set of p available features, researchers must often determine which one is the best, or rank them according to a certain prespecified criteria

  • Gupta and coauthors have pioneered the subset selection approach, in which a subset of populations is selected with a minimum probability guarantee of containing the largest mean with certain probability P ∗ [see 15]. Note that both of these approaches are mainly concerned with the problem of correct selection of the population with the largest mean rather than estimation of the selected mean. This second problem has been widely discussed in the literature, and in the following two sections we present a brief summary of the main findings, giving separate consideration to the point estimation and interval estimation procedures

  • Dahiya [12] addresses this problem for the case of two normal populations and proposed estimators that perform better in terms of mean squared error (MSE)

Read more

Summary

Introduction

Given a set of p available features, researchers must often determine which one is the best, or rank them according to a certain prespecified criteria. Instance, researchers may be interested in determining what treatment is more efficient in fighting a certain disease, or ranking the level of gene expression in a genomics experiment This type of problems is commonly referred to as ranking and selection procedures and specific solutions and methods have been proposed in the literature since the second half of the 20th century, with a start that is usually traced back to the pathbreaking works of Bechhofer [2] and Gupta & Sobel [16]. Gupta and coauthors have pioneered the subset selection approach, in which a subset of populations is selected with a minimum probability guarantee of containing the largest mean with certain probability P ∗ [see 15]. This second problem has been widely discussed in the literature, and in the following two sections we present a brief summary of the main findings, giving separate consideration to the point estimation and interval estimation procedures

Point estimation
Interval estimation
Coverage probability results
Selecting the best population
Selecting the top k populations
Post-selection confidence intervals
The unknown variance case
Numerical studies
Discussion
Lemma in Theorem 1
Findings
Proof of Theorem 2
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.