Sparse Boosting Based Machine Learning Methods for High-Dimensional Data

Mu Yue

doi:10.5772/intechopen.100506

Abstract

In high-dimensional data, penalized regression is often used for variable selection and parameter estimation. However, these methods typically require time-consuming cross-validation methods to select tuning parameters and retain more false positives under high dimensionality. This chapter discusses sparse boosting based machine learning methods in the following high-dimensional problems. First, a sparse boosting method to select important biomarkers is studied for the right censored survival data with high-dimensional biomarkers. Then, a two-step sparse boosting method to carry out the variable selection and the model-based prediction is studied for the high-dimensional longitudinal observations measured repeatedly over time. Finally, a multi-step sparse boosting method to identify patient subgroups that exhibit different treatment effects is studied for the high-dimensional dense longitudinal observations. This chapter intends to solve the problem of how to improve the accuracy and calculation speed of variable selection and parameter estimation in high-dimensional data. It aims to expand the application scope of sparse boosting and develop new methods of high-dimensional survival analysis, longitudinal data analysis, and subgroup analysis, which has great application prospects.

Highlights

High-dimensional model has become very popular in statistical literature and many new machine learning techniques have been developed to deal with data with very large number of features
Variable selection is crucial to address the challenges. Regularization procedures such as LASSO [1], smoothly clipped absolute deviation (SCAD) [2], MCP [3] and their various extensions [4–6] have been thoroughly studied and widely used to perform variable selection and estimation simultaneously in order to improve the prediction accuracy and interpretability of the statistical model
In-sample prediction errors using L2 boosting is a little bit smaller than using sparse boosting since the former has lrargffiffiffieffiffirffiffiffimffiffiffiffioffiffiffidffiffieffiffiffilffiffisffiffiiffizffiffiffieffiffisffiffi,ffiffiffitffihffiffiffieffiffiffiaffiffiffivffiffieffiffirffiffiaffiffiffigffiffieffiffiffiffioffiffif root mean integrated squared errors using sparse boosting is much smaller than that using L2 boosting

Summary

Introduction

High-dimensional model has become very popular in statistical literature and many new machine learning techniques have been developed to deal with data with very large number of features. Regularization procedures such as LASSO [1], smoothly clipped absolute deviation (SCAD) [2], MCP [3] and their various extensions [4–6] have been thoroughly studied and widely used to perform variable selection and estimation simultaneously in order to improve the prediction accuracy and interpretability of the statistical model. This chapter intends to solve the problem of how to improve the accuracy and calculation speed of variable selection and parameter estimation in high-dimensional data It aims to expand the application scope of sparse boosting and develop new methods of high-dimensional survival analysis, longitudinal data analysis, and subgroup analysis, which has great application prospects.

Sparse boosting for survival data

Model and estimation

Sparse boosting techniques

Simulation

Lung cancer data analysis

Two-step sparse boosting for longitudinal data

Yeast cell cycle gene expression data analysis

Multi-step sparse boosting for subgroup identification

Patients model

Subgroup identification and estimation

Wallaby growth data analysis

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sparse Boosting Based Machine Learning Methods for High-Dimensional Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Preface
S Ejaz Ahmed
Applied Stochastic Models in Business and Industry | VOL. 35
S Ejaz AhmedS Ejaz Ahmed
01 Mar 2019
Applied Stochastic Models in Business and Industry | VOL. 35

Two-step sparse boosting for high-dimensional longitudinal data with varying coefficients
Mu Yue ... Ming-Yen Cheng
Computational Statistics & Data Analysis | VOL. 131
Mu Yue, et. al.Mu Yue ... Ming-Yen Cheng
09 Oct 2018
Computational Statistics & Data Analysis | VOL. 131

Sparse boosting for high-dimensional survival data with varying coefficients.
Mu Yue ... Shuangge Ma
Statistics in Medicine | VOL. 37
Mu Yue, et. al.Mu Yue ... Shuangge Ma
19 Nov 2017
Statistics in Medicine | VOL. 37

An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
Yue Yong Shi ... Yan Yan Liu
Acta Mathematica Sinica, English Series | VOL. 34
Yue Yong Shi, et. al.Yue Yong Shi ... Yan Yan Liu
25 Jan 2018
Acta Mathematica Sinica, English Series | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse Boosting Based Machine Learning Methods for High-Dimensional Data

Abstract

Highlights

Summary

Talk to us

Similar Papers