Empirical Bayes methods in high dimensions: a survey and ongoing debates
Empirical Bayes methods in high dimensions: a survey and ongoing debates
- Research Article
2
- 10.1007/s11771-020-4447-2
- Aug 1, 2020
- Journal of Central South University
Before-after study with the empirical Bayes (EB) method is the state-of-the-art approach for estimating crash modification factors (CMFs). The EB method not only addresses the regression-to-the-mean bias, but also improves accuracy. However, the performance of the CMFs derived from the EB method has never been fully investigated. This study aims to examine the accuracy of CMFs estimated with the EB method. Artificial realistic data (ARD) and real crash data are used to evaluate the CMFs. The results indicate that: 1) The CMFs derived from the EB before-after method are nearly the same as the true values. 2) The estimated CMF standard errors do not reflect the true values. The estimation remains at the same level regardless of the pre-assumed CMF standard error. The EB before-after study is not sensitive to the variation of CMF among sites. 3) The analyses with real-world traffic and crash data with a dummy treatment indicate that the EB method tends to underestimate the standard error of the CMF. Safety researchers should recognize that the CMF variance may be biased when evaluating safety effectiveness by the EB method. It is necessary to revisit the algorithm for estimating CMF variance with the EB method.
- Research Article
3
- 10.1186/1471-2164-14-s8-s8
- Dec 1, 2013
- BMC Genomics
BackgroundGenome-wide association studies (GWAS) have identified hundreds of genetic variants associated with complex human diseases, clinical conditions and traits. Genetic mapping of expression quantitative trait loci (eQTLs) is providing us with novel functional effects of thousands of single nucleotide polymorphisms (SNPs). In a classical quantitative trail loci (QTL) mapping problem multiple tests are done to assess whether one trait is associated with a number of loci. In contrast to QTL studies, thousands of traits are measured alongwith thousands of gene expressions in an eQTL study. For such a study, a huge number of tests have to be performed (). This extreme multiplicity gives rise to many computational and statistical problems. In this paper we have tried to address these issues using two closely related inferential approaches: an empirical Bayes method that bears the Bayesian flavor without having much a priori knowledge and the frequentist method of false discovery rates. A three-component t-mixture model has been used for the parametric empirical Bayes (PEB) method. Inferences have been obtained using Expectation/Conditional Maximization Either (ECME) algorithm. A simulation study has also been performed and has been compared with a nonparametric empirical Bayes (NPEB) alternative.ResultsThe results show that PEB has an edge over NPEB. The proposed methodology has been applied to human liver cohort (LHC) data. Our method enables to discover more significant SNPs with FDR<10% compared to the previous study done by Yang et al. (Genome Research, 2010).ConclusionsIn contrast to previously available methods based on p-values, the empirical Bayes method uses local false discovery rate (lfdr) as the threshold. This method controls false positive rate.
- Research Article
18
- 10.1111/jbg.12191
- Nov 19, 2015
- Journal of Animal Breeding and Genetics
Linear mixed model (LMM) is one of the most popular methods for genomewide association studies (GWAS). Numerous forms of LMM have been developed; however, there are two major issues in GWAS that have not been fully addressed before. The two issues are (i) the genomic background noise and (ii) low statistical power after Bonferroni correction. We proposed an empirical Bayes (EB) method by assigning each marker effect a normal prior distribution, resulting in shrinkage estimates of marker effects. We found that such a shrinkage approach can selectively shrink marker effects and reduce the noise level to zero for majority of non-associated markers. In the meantime, the EB method allows us to use an 'effective number of tests' to perform Bonferroni correction for multiple tests. Simulation studies for both human and pig data showed that EB method can significantly increase statistical power compared with the widely used exact GWAS methods, such as GEMMA and FaST-LMM-Select. Real data analyses in human breast cancer identified improved detection signals for markers previously known to be associated with breast cancer. We therefore believe that EB method is a valuable tool for identifying the genetic basis of complex traits.
- Research Article
63
- 10.1186/s12859-015-0641-x
- Jul 10, 2015
- BMC Bioinformatics
BackgroundDNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxon rank sum test, t-test, Kolmogorov–Smirnov test, permutation test, empirical Bayes method, and bump hunting method. Nonetheless, selection of an optimal statistical method can be challenging when different methods generate inconsistent results from the same data set.ResultsWe compared six statistical approaches relevant to DNA methylation microarray analysis in terms of false discovery rate control, statistical power, and stability through simulation studies and real data examples. Observable differences were noticed between β values and M values only when methylation levels were correlated across CpG loci. For small sample size (n=3 or 6 in each group), both the empirical Bayes and bump hunting methods showed appropriate FDR control and the highest power when methylation levels across CpG loci were independent. Only the bump hunting method showed appropriate FDR control and the highest power when methylation levels across CpG sites were correlated. For medium (n=12 in each group) and large sample sizes (n=24 in each group), all methods compared had similar power, except for the permutation test whenever the proportion of differentially methylated loci was low. For all sample sizes, the bump hunting method had the lowest stability in terms of standard deviation of total discoveries whenever the proportion of differentially methylated loci was large. The apparent test power comparisons based on raw p-values from DNA methylation studies on ovarian cancer and rheumatoid arthritis provided results as consistent as those obtained in the simulation studies. Overall, these results provide guidance for optimal statistical methods selection under different scenarios.ConclusionsFor DNA methylation studies with small sample size, the bump hunting method and the empirical Bayes method are recommended when DNA methylation levels across CpG loci are independent, while only the bump hunting method is recommended when DNA methylation levels are correlated across CpG loci. All methods are acceptable for medium or large sample sizes.
- Research Article
1
- 10.1002/pst.264
- Jan 1, 2008
- Pharmaceutical Statistics
Hierarchical models are widely used in medical research to structure complicated models and produce statistical inferences. In a hierarchical model, observations are sampled conditional on some parameters and these parameters are sampled from a common prior distribution. Bayes and empirical Bayes (EB) methods have been effectively applied in analyzing these models. Despite many successes, parametric Bayes and EB methods may be sensitive to misspecification of prior distributions. In this paper, without specific restriction on the form of the prior distribution, we propose a nonparametric EB method to estimate the treatment effect of each group and develop a testing procedure to compare between-group differences. Simulation studies demonstrate that the proposed EB method was more efficient than some standard procedures. An illustrative example is provided with data from a clinical trial evaluating a new treatment for patients with stress urinary incontinence.
- Research Article
4
- 10.1016/bs.heslab.2024.11.001
- Jan 1, 2024
- Handbook of Labor Economics
Chapter 3 - Empirical Bayes methods in labor economics
- Research Article
19
- 10.1002/2016jc012506
- Mar 1, 2017
- Journal of Geophysical Research: Oceans
Tide‐gauge data are one of the longest instrumental records of the ocean, but these data can be noisy, gappy, and biased. Previous studies have used empirical Bayes methods to infer the sea‐level field from tide‐gauge records but have not accounted for uncertainty in the estimation of model parameters. Here we compare to a fully Bayesian method that accounts for uncertainty in model parameters, and demonstrate that empirical Bayes methods underestimate the uncertainty in sea level inferred from tide‐gauge records. We use a synthetic tide‐gauge data set to assess the skill of the empirical and full Bayes methods. The empirical‐Bayes credible intervals on the sea‐level field are narrower and less reliable than the full‐Bayes credible intervals: the empirical‐Bayes 95% credible intervals are 42.8% narrower on average than are the full‐Bayes 95% credible intervals; full‐Bayes 95% credible intervals capture 95.6% of the true field values, while the empirical‐Bayes 95% credible intervals capture only 77.1% of the true values, showing that parameter uncertainty has an important influence on the uncertainty of the inferred sea‐level field. Most influential are uncertainties in model parameters for data biases (i.e., tide‐gauge datums); letting data‐bias parameters vary along with the sea‐level process, but holding all other parameters fixed, the 95% credible intervals capture 92.8% of the true synthetic‐field values. Results indicate that full Bayes methods are preferable for reconstructing sea‐level estimates in cases where complete and accurate estimates of uncertainty are warranted.
- Research Article
7
- 10.1155/2015/958206
- Jan 1, 2015
- Mathematical Problems in Engineering
Hotspot identification (HSID) is an important component of the highway safety management process. A number of methods have been proposed to identify hotspots. Among these methods, previous studies have indicated that the empirical Bayes (EB) method can outperform other methods for identifying hotspots, since the EB method combines the historical crash records of the site and expected number of crashes obtained from a safety performance function (SPF) for similar sites. However, the SPFs are usually developed based on a large number of sites, which may contain heterogeneity in traffic characteristic. As a result, the hotspot identification accuracy of EB methods can possibly be affected by SPFs, when heterogeneity is present in crash data. Thus, it is necessary to consider the heterogeneity and homogeneity of roadway segments when using EB methods. To address this problem, this paper proposed three different classification-based EB methods to identify hotspots. Rural highway crash data collected in Texas were analyzed and classified into different groups using the proposed methods. Based on the modeling results for Texas crash dataset, it is found that one proposed classification-based EB method performs better than the standard EB method as well as other HSID methods.
- Research Article
- 10.3141/2136-03
- Jan 1, 2009
- Transportation Research Record: Journal of the Transportation Research Board
When effective programs to improve roadway safety are being developed, one of the primary tasks is to select sites for data collection. Selecting sites by ranking them simply by crash counts or crash rates is a common practice of transportation agencies because only crash data are required. Although the empirical Bayes (EB) method is a better option for site selection than this simple ranking method, the EB method requires additional data that might not be readily available or up to date, such as on annual average daily traffic and roadway characteristics, and this requirement could subsequently hinder the implementation of any EB method. This research, sponsored by the Georgia Department of Transportation, is motivated by the need to develop a more effective site selection method. The contributions of this paper include ( a) proposing a Poisson distribution–based wavelet shrinkage site selection (WASSS) method that can incorporate various wavelet shrinkage methods; ( b) obtaining a superior wavelet shrinkage method, the Bayesian Multiscale method (BMSM), for WASSS by evaluating various wavelet shrinkage methods; and ( c) comparing the EB method and this proposed WASSS method. It is found that the proposed BMSM-based WASSS method, as compared with the EB method, produces slightly better (or at least the same) level of performance (i.e., in terms of rates of false negatives and false positives); in addition this proposed method does not require additional data (as does the EB method). This study demonstrates that the proposed WASSS method is a promising site selection alternative that requires only crash data and that performs acceptably.
- Research Article
16
- 10.1016/j.ahj.2005.07.008
- Nov 1, 2005
- American Heart Journal
Applicability of clinical prediction models in acute myocardial infarction: A comparison of traditional and empirical Bayes adjustment methods
- Research Article
1
- 10.19139/soic-2310-5070-1733
- Dec 19, 2023
- Statistics, Optimization & Information Computing
Though the name Partial Bayes was used earlier in a different context, but in statistics this is started from 2021, (Banerjee and Seal, 2021). Also, we know that empirical Bayes method was studied extensively for several decades. In this paper, these two methods are compared in two parameter gamma distribution having shape and scale parameter. As expected, it is found that empirical Bayes method is good in some cases. However, partial Bayes method performs even better in some cases where the shape parameter is sufficiently small, i.e. variation in the data is small. Even, overall performances of these two methods do not differ too much. But whenever we have information that shape parameter is small, then in that case partial Bayes method performs well. These results are also found by extensive simulation technique. The performances of these two estimators are also compared using two real datasets.
- Research Article
17
- 10.3141/2019-05
- Jan 1, 2007
- Transportation Research Record: Journal of the Transportation Research Board
Observational before-and-after safety evaluations have been commonly used to determine the effectiveness of safety improvements applied to high-crash locations. Such evaluations may typically be affected by regression-to-mean (RTM) bias. This research compares the effectiveness of low-cost safety improvements applied to several high-crash intersections in the cities of Detroit and Grand Rapids, Michigan, and examines the RTM effects by using various safety evaluation methodologies. Some previous studies suggest that before-and-after studies are always biased. It is also claimed that the observed number of crashes cannot be used to determine the effectiveness of safety improvements, and the use of the empirical Bayes (EB) method is suggested. This research examines the RTM effects by using before-and-after studies, before-and-after studies with control sites, and two different variants of the EB method. The research reveals that the expected crash frequencies computed by various evaluation methods do not differ significantly when 3 to 5 years' worth of traffic crash data are used. The deviations of the expected crash frequencies with the before-and-after methods and the EB method are computed to compare the RTM effects for each additional year of traffic crash data used in the evaluation. This research reveals that before-and-after studies produce results similar to those of the EB method, and the RTM effect becomes insignificant when 3 or more years' worth of traffic crash data are used in the evaluation of high-crash locations.
- Research Article
8
- 10.1016/s0003-2670(00)85217-4
- Jan 1, 1986
- Analytica Chimica Acta
Bayesian calibration
- Research Article
34
- 10.1198/004017005000000085
- May 1, 2005
- Technometrics
We extend the usual implementation of u-control charts (uCCs) in two ways. First, we overcome the restrictive (and often inadequate) assumptions of the Poisson model; next, we eliminate the need for the questionable base period by using a sequential procedure. We use empirical Bayes(EB) and Bayes methods and compare them with the traditional frequentist implementation. EB methods are somewhat easy to implement, and they deal nicely with extra-Poisson variability (and, at the same time, informally check the adequacy of the Poisson assumption). However, they still need the base period. The sequential, full Bayes approach, on the other hand, also avoids this drawback of traditional u-charts. The implementation requires numerical simulation, and also use of a prior distribution. Several possibilities for both objective and informative priors are explored. We argue that the sequential, full Bayesian uCC is a powerful and versatile tool for process monitoring.
- Research Article
4
- 10.1016/j.bpj.2021.03.033
- Apr 1, 2021
- Biophysical Journal
Empirical Bayes method using surrounding pixel information for number and brightness analysis
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.