Abstract

Gene expression profiling is a useful technique for analyzing cellular function, and gene expression profiles are widely studied in human cancer research. Gene expression data usually consist of a very large number of features and a relatively small number of samples, and extracting a small number of important features from these data is a major challenge of gene expression-based analysis in cancer research. In this paper, we propose an embedded feature selection algorithm using boosted linear regression-based feature selection. The boosting technique is applied to derive the ensemble feature selector and improve the performance of linear regression-based feature selection. The proposed feature selection algorithm, called boosted regression-based feature selection for the multilayer perceptron (BREG-MLP), repeats the boosted feature selection process to extract the smallest feature subset while maintaining good classification performance. We apply the proposed BREG-MLP to some human cancer-related gene expression data sets for the purpose of extracting important features, and we confirm that BREG-MLP offers improved performance compared to single regression-based feature selection methods.

Highlights

  • Gene expression profiling is useful for understanding cellular function by visualizing the expression patterns of thousands of genes at the transcription level at specific times

  • We propose an embedded feature selection algorithm based on linear regression and neural network called boosted regression-based feature selection for the multilayer perceptron (BREG-MLP)

  • A linear regression method called the least absolute shrinkage and selection operator (LASSO) and MLP configuration were integrated for embedded feature selection, and the feature selection procedure was repeatedly performed to improve the performance of feature selection [12]

Read more

Summary

INTRODUCTION

Gene expression profiling is useful for understanding cellular function by visualizing the expression (activity) patterns of thousands of genes at the transcription level at specific times. A method for identification of cancer types based on gene expression data was proposed in [4]. In regards to gene expression profile analysis, dimension reduction makes it difficult to identify genes that are important in terms of the biological pathways associated with cancer. Feature selection methods for finding cancer-related genes without changing raw expression data have been widely studied [6]. Cho: Cancer-Related Gene Signature Selection Based on Boosted Regression for MLP. We propose an embedded feature selection algorithm based on linear regression and neural network called boosted regression-based feature selection for the multilayer perceptron (BREG-MLP). Linear regression analysis is applied to extract important features without changing the raw data. For six different human cancer-related gene expression profiles, the proposed BREG-MLP is applied to extract gene signatures.

RELATED WORKS
BACKGROUND
MULTILAYER PERCEPTRON
BOOSTING
METHODS
RESULT
DATA SETS
EFFECT OF BOOSTING
EFFECT OF PERFORMANCE THRESHOLD
EFFECT OF CANDIDATE REGRESSION MODELS
COMPUTATIONAL COMPLEXITY
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call