Bayesian Analysis of Sample Selection and Endogenous Switching Regression Models with Random Coefficients Via MCMC Methods
This paper develops a Bayesian method for estimating and testing the parameters of the endogenous switching regression model and sample selection models. Random coefficients are incorporated in both the decision and regime regression models to reflect heterogeneity across individual units or clusters and correlation of observations within clusters. The case of tobit type regime regression equations are also considered. A combination of Markov chain Monte Carlo methods, data augmentation and Gibbs sampling is used to facilitate computation of Bayes posterior statistics. A simulation study is conducted to compare estimates from full and reduced blocking schemes and to investigate sensitivity to prior information. The Bayesian methodology is applied to data sets on currency hedging and goods trade, cross-country privatisation, and adoption of soil conservation technology. Estimation and inference results on marginal effects, average decision or selection effect as well as model comparison are presented. The expected decision effect is broken down into average effect of individual's decision on the response variable, decision effect due to random components, and differential effect due to latent correlated random components. Application of the proposed Bayesian MCMC algorithm to real data sets reveal that the normality assumption still holds for most commonly encountered economic data.
- Research Article
508
- 10.1016/0304-4076(94)01720-4
- May 1, 1996
- Journal of Econometrics
On the choice between sample selection and two-part models
- Research Article
88
- 10.1002/hyp.10005
- Sep 11, 2013
- Hydrological Processes
Previous studies have drawn attention to substantial hydrological changes taking place in mountainous watersheds where hydrology is dominated by cryospheric processes. Modelling is an important tool for understanding these changes but is particularly challenging in mountainous terrain owing to scarcity of ground observations and uncertainty of model parameters across space and time. This study utilizes a Markov Chain Monte Carlo data assimilation approach to examine and evaluate the performance of a conceptual, degree‐day snowmelt runoff model applied in the Tamor River basin in the eastern Nepalese Himalaya. The snowmelt runoff model is calibrated using daily streamflow from 2002 to 2006 with fairly high accuracy (average Nash–Sutcliffe metric ~0.84, annual volume bias < 3%). The Markov Chain Monte Carlo approach constrains the parameters to which the model is most sensitive (e.g. lapse rate and recession coefficient) and maximizes model fit and performance. Model simulated streamflow using an interpolated precipitation data set decreases the fractional contribution from rainfall compared with simulations using observed station precipitation. The average snowmelt contribution to total runoff in the Tamor River basin for the 2002–2006 period is estimated to be 29.7 ± 2.9% (which includes 4.2 ± 0.9% from snowfall that promptly melts), whereas 70.3 ± 2.6% is attributed to contributions from rainfall. On average, the elevation zone in the 4000–5500 m range contributes the most to basin runoff, averaging 56.9 ± 3.6% of all snowmelt input and 28.9 ± 1.1% of all rainfall input to runoff. Model simulated streamflow using an interpolated precipitation data set decreases the fractional contribution from rainfall versus snowmelt compared with simulations using observed station precipitation. Model experiments indicate that the hydrograph itself does not constrain estimates of snowmelt versus rainfall contributions to total outflow but that this derives from the degree‐day melting model. Lastly, we demonstrate that the data assimilation approach is useful for quantifying and reducing uncertainty related to model parameters and thus provides uncertainty bounds on snowmelt and rainfall contributions in such mountainous watersheds. Copyright © 2013 John Wiley & Sons, Ltd.
- Research Article
152
- 10.1111/1467-9876.00210
- Dec 1, 2000
- Journal of the Royal Statistical Society Series C: Applied Statistics
SUMMARY The analysis of infectious disease data presents challenges arising from the dependence in the data and the fact that only part of the transmission process is observable. These difficulties are usually overcome by making simplifying assumptions. The paper explores the use of Markov chain Monte Carlo (MCMC) methods for the analysis of infectious disease data, with the hope that they will permit analyses to be made under more realistic assumptions. Two important kinds of data sets are considered, containing temporal and non-temporal information, from outbreaks of measles and influenza. Stochastic epidemic models are used to describe the processes that generate the data. MCMC methods are then employed to perform inference in a Bayesian context for the model parameters. The MCMC methods used include standard algorithms, such as the Metropolis–Hastings algorithm and the Gibbs sampler, as well as a new method that involves likelihood approximation. It is found that standard algorithms perform well in some situations but can exhibit serious convergence difficulties in others. The inferences that we obtain are in broad agreement with estimates obtained by other methods where they are available. However, we can also provide inferences for parameters which have not been reported in previous analyses.
- Research Article
5
- 10.1007/s10994-015-5534-8
- Oct 20, 2015
- Machine Learning
Variable selection in high dimensional data is a challenging problem due to the exponential number of variable combinations, and Markov Chain Monte Carlo (MCMC) methods represent the state of the art to solve it. With genomics data this problem becomes even more difficult because there are generally more dimensions (variables) than points (records) leading to slow convergence and numerically unstable solutions. On the other hand, despite many alternative prototypes and languages, R remains a popular system to compute machine learning models. Unfortunately, R can be particularly slow with heavy matrix computations and the high number of iterations required by MCMC methods. Moreover, making R scale to large matrices, possibly beyond RAM, requires careful system integration. Recently, array DBMSs have opened the possibility of manipulating matrices of unlimited size. With such motivation in mind, we present algorithmic optimizations to accelerate the computation of variable selection in linear regression with the Gibbs sampler, a fundamental MCMC method. Such optimizations have the potential to accelerate other models. We study how to leverage the speed and scalability of the array DBMS to exploit our optimizations in R. We present a comprehensive experimental evaluation to assess time efficiency and model quality with a cancer data set containing RNA and miRNA variables to predict survival time. We show our optimized algorithm combining DBMS and R processing is significantly faster than R alone. We show our system allows fast joint analysis of RNA and miRNA variables, instead of analyzing them separately. Finally, we confirm our algorithm finds medically significant variables already identified in the biomedical literature. Our optimized MCMC method for the array DBMS can be easily called from R, leaving the final model within R runtime in RAM for further interpretation.
- Research Article
4
- 10.1080/03610918.2021.1967985
- Aug 15, 2021
- Communications in Statistics - Simulation and Computation
Multilevel modeling is a modern approach to deal with hierarchical or a nested data structure which can assess the variability between clusters. Bayesian Markov Chain Monte Carlo (MCMC) methods of estimations are advanced methods applicable for estimating multilevel models. However, these estimation methods are not as yet tested to identify its’ performances as well as the properties associated with these estimation methods. This study targets to conduct a comparison of Bayesian MCMC methods which are developed for multilevel models where the response is normally distributed. The comparison is based upon extensive simulations and an application to a real-life dataset. The performance of Gibbs sampling (GS) and Metropolis Hastings (MH) methods are compared using a simulation study and additionally the factors which can affect the performance of both MCMC methods are identified. Practicality of these methods in real world scenario is confirmed through the application of MCMC method to a dataset. In the simulations though the Metropolis Hastings (MH) shows slightly better performance than Gibbs, there is no evidence to indicate that significant differences exist between these methods except for small samples where MH is superior. The results from the example are not as clear as from the simulations.
- Research Article
13
- 10.1109/access.2019.2935547
- Jan 1, 2019
- IEEE Access
In this paper, we propose a model-free volumetric Next Best View (NBV) algorithm for accurate 3D reconstruction using a Markov Chain Monte Carlo method for high-mix-low-volume objects in manufacturing. The volumetric information gain based Next Best View algorithm can in real-time select the next optimal view that reveals the maximum uncertainty of the scanning environment with respect to a partially reconstructed 3D Occupancy map, without any priori knowledge of the target. Traditional Occupancy grid maps make two independence assumptions for computational tractability but suffer from the overconfident estimation of the occupancy probability for each voxel leading to less precise surface reconstructions. This paper proposes a special case of the Markov Chain Monte Carlo (MCMC) method, the Gibbs sampler, to accurately estimate the posterior occupancy probability of a voxel by randomly sampling from its high-dimensional full posterior occupancy probability given the entire volumetric map with respect to the forward sensor model with a Gaussian distribution. Numerical experiments validate the performance of the MCMC Gibbs sampler algorithm under the ROS-Industry framework to prove the accuracy of the reconstructed Occupancy map and the completeness of the registered point cloud. The proposed MCMC Occupancy mapping could be used to optimise the tuning parameters of the online NBV algorithms via the inverse sensor model to realise industry automation.
- Book Chapter
195
- 10.1016/b978-0-444-53548-1.50003-9
- Jan 1, 2010
- Handbook of Financial Econometrics, Vol 2
CHAPTER 13 - MCMC Methods for Continuous-Time Financial Econometrics
- Research Article
84
- 10.2139/ssrn.480461
- Sep 23, 2010
- SSRN Electronic Journal
MCMC Methods for Continuous-Time Financial Econometrics
- Research Article
2
- 10.2139/ssrn.2553537
- Jan 23, 2015
- SSRN Electronic Journal
Markov Chain Monte Carlo Models, Gibbs Sampling, & Metropolis Algorithm for High-Dimensionality Complex Stochastic Problems
- Research Article
2
- 10.1023/a:1011673403421
- Jan 1, 2001
- Computational Economics
This study develops Bayesian methods for estimating the parameters of a stochastic switching regression model. Markov Chain Monte Carlo methods, data augmentation, and Gibbs sampling are used to facilitate estimation of the posterior means. The main feature of these methods is that the posterior means are estimated by the ergodic averages of samples drawn from conditional distributions, which are relatively simple in form and more feasible to sample from than the complex joint posterior distribution. A simulation study is conducted comparing model estimates obtained using data augmentation, Gibbs sampling, and the maximum likelihood EM algorithm and determining the effects of the accuracy of and bias of the researcher's prior distributions on the parameter estimates.
- Book Chapter
- 10.1007/978-3-7908-1782-9_49
- Jan 1, 2002
This paper presents an overview of Markov Chain Monte Carlo (MCMC) methods for statistical inference and applications. The article begins by describing ordinary Monte Carlo methods, which in principle has the same goals as the MCMC but can hardly be implemented in practice. Following that basic Markov Chain Monte Carlo is discussed, which is founded on the Hastings algorithm and includes Metropolis method and the Gibbs sampler as special cases. Finally, various special applications of Markov Chain Monte Carlo methods are briefly mentioned and some recent development of MCMC are covered in final remarks section.KeywordsReal EstateMarkov Chain Monte CarloGibbs SamplerDetailed BalanceMarkov Chain Monte Carlo MethodThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Research Article
4
- 10.1554/br06-6.1
- Jan 1, 2006
- Evolution
Statistical Methods in Molecular Evolution, edited by Ras-mus Nielsen, contains a wide survey of current research inmolecular evolution. It is organized into sections—introduc-tion, program ‘‘tutorials,’’ models, and inference—a setupthat constitutes a gentle introduction to the topic for math-ematically inclined readers. For practical biologists, the startmight be somewhat more challenging, although the intro-duction is tailored to a mixed audience. All chapters arewritten by researchers with active research projects in theareas they write about. I address each chapter with a shortcomment.Introduction. The introductory material sets the stage forall further chapters. Without going into too much depth, theauthors give a broad overview of topics such as Markovchain-based substitution models, likelihood concept, Markovchain Monte Carlo (MCMC) methods, and population geneticaspects of molecular evolution. (1) ‘‘Markov Models in Evo-lution:’’ Galtier, Gascuel, and Jean-Marie give a crash courseon Markov models that will leave mathematicians happilyhumming along and many biologists struggling with themathematical syntax. The discussion of population modelsof DNA, RNA, and protein sequence evolution is concise butlacks the presentation of the transition probabilities for someof the models. Readers who want to familiarize themselveswith these models still need to read Felsenstein (2004) andSwofford et al. (1996). (2) ‘‘Introduction to Applications ofthe Likelihood Function in Molecular Evolution:’’Buschbomand von Haeseler give an overview of the likelihood prin-ciple. Several examples of application of the maximum-like-lihood principle—from simple one-parameter inferences tocomplicated many-parameter problems, such as finding thebest tree given a set of sampled sequences—are given. Thedifficulties inherent in likelihood ratio testing receive toolittle attention. It would have been useful to read about dif-ficulties with testing of hypotheses, taking into accountboundary conditions of the parameters. For example, howshould one test if a branch length in a phylogenetic tree iszero? And should this be used as a means of judging supportfor the tree? Given that this book will have a much higherprofile than a single paper, coverage of such topics wouldhave been helpful to many readers. (3) ‘‘Introduction to Mar-kov Chain Monte Carlo Methods in Molecular Evolution:’’Larget gives a brief introduction to MCMC sampling, usinga Bayesian approach exclusively. The Gibbs sampler, a spe-cial case of the Metropolis-Hastings (MH) sampler, is ex-plained in detail. Regarding phylogenetics and population
- Research Article
49
- 10.1186/bf03351676
- Aug 1, 2001
- Earth, Planets and Space
This paper presents a practical and objective procedure for a Bayesian inversion of geophysical data. We have applied geostatistical techniques such as kriging and simulation algorithms to acquire a prior model information. Then the Markov chain Monte Carlo (MCMC) method is adopted to infer the characteristics of the marginal distributions of model parameters. Geostatistics which is based upon a variogram model provides a means to analyze and interpret the spatially distributed data. For Bayesian inversion of dipole-dipole resistivity data, we have used the indicator kriging and simulation techniques to generate cumulative density functions from Schlumberger and well logging data for obtaining a prior information by cokriging and simulations from covariogram models. Indicator approaches make it possible to incorporate non-parametric information into the probabilistic density function. We have also adopted the Markov chain Monte Carlo approach, based on Gibbs sampling, to examine the characteristics of a posterior probability density function and marginal distributions of each parameter. The MCMC technique provides a robust result from which information given by the indicator method, that is fundamentally non-parametric, is fully extracted. We have used the a prior information proposed by the geostatistical method as the full conditional distribution for Gibbs sampling. And to implement Gibbs sampler, we have applied the modified Simulated Annealing (SA) algorithm which effectively searched for global model space. This scheme provides a more effective and robust global sampling algorithm as compared to the previous study.
- Research Article
5
- 10.3233/sji-200655
- Jan 1, 2020
- Statistical Journal of the IAOS
Markov Chain Monte Carlo (MCMC) method has been a popular method for getting information about probability distribution for estimating posterior distribution by Gibbs sampling. So far, the standard methods such as maximum likelihood and logistic ridge regression methods have represented to compare with MCMC. The maximum likelihood method is the classical method to estimate the parameter on the logistic regression model by differential the loglikelihood function on the estimator. The logistic ridge regression depends on the choice of ridge parameter by using crossvalidation for computing estimator on penalty function. This paper provides maximum likelihood, logistic ridge regression, and MCMC to estimate parameter on logit function and transforms into a probability. The logistic regression model predicts the probability to observe a phenomenon. The prediction accuracy evaluates in terms of the percentage with correct predictions of a binary event. A simulation study conducts a binary response variable by using 2, 4, and 6 explanatory variables, which are generated from multivariate normal distribution on the positive and negative correlation coefficient or called multicollinearity problem. The criterion of these methods is to compare by a maximum of predictive accuracy. The outcomes find that MCMC satisfies all situations.
- Book Chapter
- 10.1093/oso/9780198841296.003.0016
- May 23, 2019
This chapter introduces Markov Chain Monte Carlo (MCMC) with Gibbs sampling, revisiting the “Maple Syrup Problem” of Chapter 12, where the goal was to estimate the two parameters of a normal distribution, μ and σ. Chapter 12 used the normal-normal conjugate to derive the posterior distribution for the unknown parameter μ; the parameter σ was assumed to be known. This chapter uses MCMC with Gibbs sampling to estimate the joint posterior distribution of both μ and σ. Gibbs sampling is a special case of the Metropolis–Hastings algorithm. The chapter describes MCMC with Gibbs sampling step by step, which requires (1) computing the posterior distribution of a given parameter, conditional on the value of the other parameter, and (2) drawing a sample from the posterior distribution. In this chapter, Gibbs sampling makes use of the conjugate solutions to decompose the joint posterior distribution into full conditional distributions for each parameter.