Feature Selection Stability Research Articles

Simple SummaryIn this study, we investigated the potential of radiomic models to discriminate nasopharyngeal carcinoma from benign hyperplasia on MRI, which is important to enable screening programs to detect cancer early. We found that whereas radiomics showed promising performance, instability was presented by the feature selection step in the radiomics pipeline, which could undermine its reliability. Therefore, we built a radiomics model using 17 features selected from a pool of 422 features by a proposed ensemble technique that improved the feature selection stability using a combination of bagging and boosting. This radiomic model achieved an area under the curve of 0.85 and 0.80 for discriminating the two abnormalities on the training and testing data, respectively. In addition, the proposed feature selection technique significantly improved stability when compared to well-established techniques.Discriminating early-stage nasopharyngeal carcinoma (NPC) from benign hyperplasia (BH) on MRI is a challenging but important task for the early detection of NPC in screening programs. Radiomics models have the potential to meet this challenge, but instability in the feature selection step may reduce their reliability. Therefore, in this study, we aim to discriminate between early-stage T1 NPC and BH on MRI using radiomics and propose a method to improve the stability of the feature selection step in the radiomics pipeline. A radiomics model was trained using data from 442 patients (221 early-stage T1 NPC and 221 with BH) scanned at 3T and tested on 213 patients (99 early-stage T1 NPC and 114 BH) scanned at 1.5T. To verify the improvement in feature selection stability, we compared our proposed ensemble technique, which uses a combination of bagging and boosting (BB-RENT), with the well-established elastic net. The proposed radiomics model achieved an area under the curve of 0.85 (95% confidence interval (CI): 0.82–0.89) and 0.80 (95% CI: 0.74–0.86) in discriminating NPC and BH in the 3T training and 1.5T testing cohort, respectively, using 17 features selected from a pool of 422 features by the proposed feature selection technique. BB-RENT showed a better feature selection stability compared to the elastic net (Jaccard index = 0.39 ± 0.14 and 0.24 ± 0.06, respectively; p < 0.001).

Read full abstract

Feature selection (FS, i.e., selection of a subset of predictor variables) is essential in high-dimensional datasets to prevent overfitting of prediction/classification models and reduce computation time and resources. In genomics, FS allows identifying relevant markers and designing low-density SNP chips to evaluate selection candidates. In this research, several univariate and multivariate FS algorithms combined with various parametric and non-parametric learners were applied to the prediction of feed efficiency in growing pigs from high-dimensional genomic data. The objective was to find the best combination of feature selector, SNP subset size, and learner leading to accurate and stable (i.e., less sensitive to changes in the training data) prediction models. Genomic best linear unbiased prediction (GBLUP) without SNP pre-selection was the benchmark. Three types of FS methods were implemented: (i) filter methods: univariate (univ.dtree, spearcor) or multivariate (cforest, mrmr), with random selection as benchmark; (ii) embedded methods: elastic net and least absolute shrinkage and selection operator (LASSO) regression; (iii) combination of filter and embedded methods. Ridge regression, support vector machine (SVM), and gradient boosting (GB) were applied after pre-selection performed with the filter methods. Data represented 5,708 individual records of residual feed intake to be predicted from the animal’s own genotype. Accuracy (stability of results) was measured as the median (interquartile range) of the Spearman correlation between observed and predicted data in a 10-fold cross-validation. The best prediction in terms of accuracy and stability was obtained with SVM and GB using 500 or more SNPs [0.28 (0.02) and 0.27 (0.04) for SVM and GB with 1,000 SNPs, respectively]. With larger subset sizes (1,000–1,500 SNPs), the filter method had no influence on prediction quality, which was similar to that attained with a random selection. With 50–250 SNPs, the FS method had a huge impact on prediction quality: it was very poor for tree-based methods combined with any learner, but good and similar to what was obtained with larger SNP subsets when spearcor or mrmr were implemented with or without embedded methods. Those filters also led to very stable results, suggesting their potential use for designing low-density SNP chips for genome-based evaluation of feed efficiency.

Read full abstract

Feature Selection Stability Research Articles

Related Topics

Articles published on Feature Selection Stability

Importance of feature selection stability in the classifier evaluation on high-dimensional genetic data

Improving the Feature Selection Stability of the Delta Test in Regression

A new definition for feature selection stability analysis

A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine Learning

An Empirical Evaluation of Feature Selection Stability and Classification Accuracy

Data Integration-Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics.

Radiomics for Discrimination between Early-Stage Nasopharyngeal Carcinoma and Benign Hyperplasia with Stable Feature Selection on MRI.

On Feature Selection Algorithms and Feature Selection Stability Measures : A Comparative Analysis

Assesing The Stability And Selection Performance Of Feature Selection Methods Under Different Data Complexity

Stable feature selection based on brain storm optimisation for high‐dimensional data

Benchmark of filter methods for feature selection in high-dimensional gene expression survival data.

Individualized real‐time prediction of working memory performance by classifying electroencephalography signals

Improving nature-inspired algorithms for feature selection

Feature selection and threshold method based on fuzzy joint mutual information

Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning.

Stable ant‐antlion optimiser for feature selection on high‐dimensional data

Stable bagging feature selection on medical data

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology.

Feature selection based on fuzzy joint mutual information maximization.

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Feature Selection Stability Research Articles

Related Topics

Articles published on Feature Selection Stability

Importance of feature selection stability in the classifier evaluation on high-dimensional genetic data

Improving the Feature Selection Stability of the Delta Test in Regression

A new definition for feature selection stability analysis

A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine Learning

An Empirical Evaluation of Feature Selection Stability and Classification Accuracy

Data Integration-Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics.

Radiomics for Discrimination between Early-Stage Nasopharyngeal Carcinoma and Benign Hyperplasia with Stable Feature Selection on MRI.

On Feature Selection Algorithms and Feature Selection Stability Measures : A Comparative Analysis

Assesing The Stability And Selection Performance Of Feature Selection Methods Under Different Data Complexity

Stable feature selection based on brain storm optimisation for high‐dimensional data

Benchmark of filter methods for feature selection in high-dimensional gene expression survival data.

Individualized real‐time prediction of working memory performance by classifying electroencephalography signals

Improving nature-inspired algorithms for feature selection

Feature selection and threshold method based on fuzzy joint mutual information

Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning.

Stable ant‐antlion optimiser for feature selection on high‐dimensional data

Stable bagging feature selection on medical data

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology.

Feature selection based on fuzzy joint mutual information maximization.

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.