Variable Selection Using aModified Gibbs Sampler Algorithm with Application on Rock Strength Dataset

Ghadeer J M Mahdi,Othman M Salih

doi:10.21123/bsj.2022.19.3.0551

Abstract

Variable selection is an essential and necessary task in the statistical modeling field. Several studies have triedto develop and standardize the process of variable selection, but it isdifficultto do so. The first question a researcher needs to ask himself/herself what are the most significant variables that should be used to describe a given dataset’s response. In thispaper, a new method for variable selection using Gibbs sampler techniqueshas beendeveloped.First, the model is defined, and the posterior distributions for all the parameters are derived.The new variable selection methodis tested usingfour simulation datasets. The new approachiscompared with some existingtechniques: Ordinary Least Squared (OLS), Least Absolute Shrinkage and Selection Operator (Lasso), and Tikhonov Regularization (Ridge). The simulation studiesshow that the performance of our method is better than the othersaccording to the error and the time complexity. Thesemethodsare applied to a real dataset, which is called Rock StrengthDataset.The new approach implemented using the Gibbs sampler is more powerful and effective than other approaches.All the statistical computations conducted for this paper are done using R version 4.0.3 on a single processor computer.

Highlights

Forward and backward selection methods are used to select the best subsets of variables by following some steps[3].These methods are slow with large datasets[4]
The true values of the parameters are close to the estimated parameters.Thecovariates associated withβ[0], β2, β4, β6and β7were selected as the most significant covariates because they were close to the true model coefficients, as showninTable 1.in Least Absolute Shrinkage and Selection Operator (Lasso) and Ridge methods,all the covariateswere selected as important variables.Computationally, selecting all the variables as important variables is inefficient because both the error and time willincrease for the large datasets
Gibbs samplerhas been discussed in this article.The posterior distributions for βand σ2have been derived, andthe Gibbs sampler algorithmis used to sample from the corresponding distributions

Summary

Introduction

Simulated samples arethinned at every 5thsample to reduce the correlation between the samples.Both Gibbs sampler and Lasso methodsare used to identifythe most important variables from the 8 variables.Parameters aresummarized from their corresponding posterior means, and some of themarevery good estimatorsof the corresponding true value. The true values of the parameters are close to the estimated parameters.Thecovariates associated withβ[0], β2, β4, β6and β7were selected as the most significant covariates because they were close to the true model coefficients, as showninTable 1.in Lasso and Ridge methods,all the covariateswere selected as important variables.Computationally, selecting all the variables as important variables is inefficient because both the error and time willincrease for the large datasets. Boxplots are plotted.InFig.3b, some outliers in the dataset are realized.So,they areremoved before running Gibbsand Lasso variables selection methods.The correlation matrix for the 8 predictors in the real data set (RSD) is given in Fig.[4]. Figure 4.correlation matrix for the 8 predictors in SRD and their distributions

Findings

Result and Discussion of the Variable

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Baghdad Science Journal	Publication Date: Jun 1, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Variable Selection Using aModified Gibbs Sampler Algorithm with Application on Rock Strength Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Baghdad Science Journal

Lead the way for us

Similar Papers

Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
Hiromasa Kaneko
Heliyon | VOL. 7
Hiromasa KanekoHiromasa Kaneko
01 Jun 2021
Heliyon | VOL. 7

Application of variable selection and dimension reduction on predictors of MSE\u2019s development
Habtamu Tilaye Wubetie
Journal of Big Data | VOL. 6
Habtamu Tilaye WubetieHabtamu Tilaye Wubetie
18 Feb 2019
Application of variable selection and dimension reduction on predictors of MSE\u2019s development
Habtamu Tilaye Wubetie

A variable informative criterion based on weighted voting strategy combined with LASSO for variable selection in multivariate calibration
Ruoqiu Zhang ... Yiping Du
Chemometrics and Intelligent Laboratory Systems | VOL. 184
Ruoqiu Zhang, et. al.Ruoqiu Zhang ... Yiping Du
07 Dec 2018
Chemometrics and Intelligent Laboratory Systems | VOL. 184

Combined performance of screening and variable selection methods in ultra-high dimensional data in predicting time-to-event outcomes
Lira Pi ... Susan Halabi
Diagnostic and Prognostic Research | VOL. 2
Lira Pi, et. al.Lira Pi ... Susan Halabi
26 Sep 2018
Diagnostic and Prognostic Research | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variable Selection Using aModified Gibbs Sampler Algorithm with Application on Rock Strength Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Baghdad Science Journal