Abstract
Consider an experiment in which $p$ independent populations $\pi_{i}$ with corresponding unknown means $\theta_{i}$ are available, and suppose that for every $1\leq i\leq p$, we can obtain a sample $X_{i1},\ldots,X_{in}$ from $\pi_{i}$. In this context, researchers are sometimes interested in selecting the populations that yield the largest sample means as a result of the experiment, and then estimate the corresponding population means $\theta_{i}$. In this paper, we present a frequentist approach to the problem and discuss how to construct simultaneous confidence intervals for the means of the $k$ selected populations, assuming that the populations $\pi_{i}$ are independent and normally distributed with a common variance $\sigma^{2}$. The method, based on the minimization of the coverage probability, obtains confidence intervals that attain the nominal coverage probability for any $p$ and $k$, taking into account the selection procedure.
Highlights
Given a set of p available features, researchers must often determine which one is the best, or rank them according to a certain prespecified criteria
Gupta and coauthors have pioneered the subset selection approach, in which a subset of populations is selected with a minimum probability guarantee of containing the largest mean with certain probability P ∗ [see 15]. Note that both of these approaches are mainly concerned with the problem of correct selection of the population with the largest mean rather than estimation of the selected mean. This second problem has been widely discussed in the literature, and in the following two sections we present a brief summary of the main findings, giving separate consideration to the point estimation and interval estimation procedures
Dahiya [12] addresses this problem for the case of two normal populations and proposed estimators that perform better in terms of mean squared error (MSE)
Summary
Given a set of p available features, researchers must often determine which one is the best, or rank them according to a certain prespecified criteria. Instance, researchers may be interested in determining what treatment is more efficient in fighting a certain disease, or ranking the level of gene expression in a genomics experiment This type of problems is commonly referred to as ranking and selection procedures and specific solutions and methods have been proposed in the literature since the second half of the 20th century, with a start that is usually traced back to the pathbreaking works of Bechhofer [2] and Gupta & Sobel [16]. Gupta and coauthors have pioneered the subset selection approach, in which a subset of populations is selected with a minimum probability guarantee of containing the largest mean with certain probability P ∗ [see 15]. This second problem has been widely discussed in the literature, and in the following two sections we present a brief summary of the main findings, giving separate consideration to the point estimation and interval estimation procedures
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.