Comparison of variable selection methods in partial least squares regression

Tahir Mehmood,Solve Sæbø,Kristian Hovde Liland

doi:10.1002/cem.3226

Abstract

AbstractThrough the remarkable progress in technology, it is getting easier and easier to generate vast amounts of variables from a given sample. The selection of variables is imperative for data reduction and for understanding the modeled relationship. Partial least squares (PLS) regression is among the modeling approaches that address high throughput data. A considerable list of variable selection methods has been introduced in PLS. Most of these methods have been reviewed in a recently conducted study. Motivated by this, we have therefore conducted a comparison of available methods for variable selection within PLS. The main focus of this study was to reveal patterns of dependencies between variable selection method and data properties, which can guide the choice of method in practical data analysis. To this aim, a simulation study was conducted with data sets having diverse properties like the number of variables, the number of samples, model complexity level, and information content. The results indicate that the above factors like the number of variables, number of samples, model complexity level, information content and variant of PLS methods, and their mutual higher‐order interactions all significantly define the prediction capabilities of the model and the choice of variable selection strategy.

Highlights

IntroductionThanks to the massive use of data generation technologies (spectroscopy, RNAs, satellite images, brain images, etc), a huge amount of data is created in many real-life applications
Thanks to the massive use of data generation technologies, a huge amount of data is created in many real-life applications
This study provides the comparison of 17 variable selection methods in Partial least squares (PLS)

Summary

Introduction

Thanks to the massive use of data generation technologies (spectroscopy, RNAs, satellite images, brain images, etc), a huge amount of data is created in many real-life applications. It enables economic, speedy, and efficient generation of information (variables) of given objects (samples). In order to understand the complexity behind such high-dimensional data sets, multivariate approaches are mandatory to consider. The negative aspect of data generation technologies is the inclusion of irrelevant variables. These irrelevant variables result in a declination of the model performance, in amplification of model complexity, and in the reduction of the understandability of modeled relations. Exclusion of irrelevant variables is important.[1,2,3,4]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Chemometrics	Publication Date: Feb 20, 2020
Citations: 128	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Comparison of variable selection methods in partial least squares regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Chemometrics

Lead the way for us

Similar Papers

Determination of aflatoxin B1 level in rice (Oryza sativa L.) through near-infrared spectroscopy and an improved simulated annealing variable selection method
Pauline Ong ... Ching-Feng Chiu
Food Control | VOL. 136
Pauline Ong, et. al.Pauline Ong ... Ching-Feng Chiu
01 Jun 2022
Food Control | VOL. 136

Using elastic net regression to perform spectrally relevant variable selection
Cannon Giglio ... Steven D Brown
Journal of Chemometrics | VOL. 32
Cannon Giglio, et. al.Cannon Giglio ... Steven D Brown
25 Apr 2018
Journal of Chemometrics | VOL. 32

Application of variable selection and dimension reduction on predictors of MSE\u2019s development
Habtamu Tilaye Wubetie
Journal of Big Data | VOL. 6
Habtamu Tilaye WubetieHabtamu Tilaye Wubetie
18 Feb 2019
Application of variable selection and dimension reduction on predictors of MSE\u2019s development
Habtamu Tilaye Wubetie

Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
Hiromasa Kaneko
Heliyon | VOL. 7
Hiromasa KanekoHiromasa Kaneko
01 Jun 2021
Heliyon | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of variable selection methods in partial least squares regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Chemometrics