Embedded Feature Selection Method Research Articles

Optical remote sensing techniques can indicate the properties of objects by observing different modalities (physical quantities) of the backscattered light at different optical wavelengths. Established examples are reflectance, fluorescence, Raman, or depolarization spectroscopy. LiDAR sensing, on the other hand, allows acquiring the geometry of objects by measuring the propagation delay of optical probing signals. Multimodal multispectral (MM) LiDAR combines these capabilities and extends conventional monochromatic LiDAR in both spectral and modal dimensions within a single instrument, thus enriching point cloud data with non-geometric information. The potentially high dimension of MM LiDAR data, however, poses significant challenges for instrumental design, data acquisition, and data processing. MM LiDAR data are structured as several or all modalities are available in each of the spectral channels. The above challenges can thus be mitigated by feature selection (FS), if the structure of the features is taken into account, i.e., if entire spectral channels or entire modalities are selected or omitted. Herein, we focus on the feature selection method for MM LiDAR and propose a multiclass group feature selection algorithm (MGSVM FS) consisting of a structural sparsity-based embedded feature selection method with an all-in-one support vector machine (SVM). It tackles jointly the challenges arising from the high dimension of the MM data and the need for a multiclass classification task while exploiting the structure of the MM data. In addition, we introduce a complete workflow for evaluating the feature selection and for decision-making. We apply the framework to selecting an optimum spectral and modal configuration for remote material classification using an experimental MM LiDAR system that provides reflectance, distance, and degree of linear polarization in 28 spectral channels of 10 nm width. For the experimental investigation, we use MM LiDAR data obtained in a controlled lab environment from thirty specimens of four material classes relevant for construction. Using all three modalities, we find a configuration with only 3 spectral channels that achieves a classification mean-F1 score of 100% within this small dataset. Similar classification performance can also be achieved with only two modalities when using more spectral channels. MGSVM FS improves the classification mean-F1 score by up to 25% as compared to random selection and outperforms two other commonly used filter and embedded feature selection methods, in this application example. The proposed group feature selection algorithm and decision-making are useful for MM LiDAR, providing a link between instrumental design, data acquisition, and data processing. However, they are also transferable to other application fields related to multiclass classification, regression, and knowledge discovery, with features structured in groups. The collected MM feature dataset, the MGSVM FS algorithm, and the evaluation pipeline are accessible online.11https://github.com/yuhan-yhyh/Dataset_Code_MGSVM-FS-MM-LiDAR.git.

Prediction of the stage of cancer plays an important role in planning the course of treatment and has been largely reliant on imaging tools which do not capture molecular events that cause cancer progression. Gene-expression data-based analyses are able to identify these events, allowing RNA-sequence and microarray cancer data to be used for cancer analyses. Breast cancer is the most common cancer worldwide, and is classified into four stages - stages 1, 2, 3, and 4 [2]. While machine learning models have previously been explored to perform stage classification with limited success, multi-class stage classification has not had significant progress. There is a need for improved multi-class classification models, such as by investigating deep learning models. Gene-expression-based cancer data is characterised by the small size of available datasets, class imbalance, and high dimensionality. Class balancing methods must be applied to the dataset. Since all the genes are not necessary for stage prediction, retaining only the necessary genes can improve classification accuracy. The breast cancer samples are to be classified into 4 classes of stages 1 to 4. Invasive ductal carcinoma breast cancer samples are obtained from The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) datasets and combined. Two class balancing techniques are explored, synthetic minority oversampling technique (SMOTE) and SMOTE followed by random undersampling. A hybrid feature selection pipeline is proposed, with three pipelines explored involving combinations of filter and embedded feature selection methods: Pipeline 1 - minimum-redundancy maximum-relevancy (mRMR) and correlation feature selection (CFS), Pipeline 2 - mRMR, mutual information (MI) and CFS, and Pipeline 3 - mRMR and support vector machine-recursive feature elimination (SVM-RFE). The classification is done using deep learning models, namely deep neural network, convolutional neural network, recurrent neural network, a modified deep neural network, and an AutoKeras generated model. Classification performance post class-balancing and various feature selection techniques show marked improvement over classification prior to feature selection. The best multiclass classification was found to be by a deep neural network post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with a Cohen-Kappa score of 0.303 and a classification accuracy of 53.1%. For binary classification into early and late-stage cancer, the best performance is obtained by a modified deep neural network (DNN) post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with an accuracy of 81.0% and a Cohen-Kappa score (CKS) of 0.280. This pipeline also showed improved multiclass classification performance on neuroblastoma cancer data, with a best area under the receiver operating characteristic (auROC) curve score of 0.872, as compared to 0.71 obtained in previous work, an improvement of 22.81%. The results and analysis reveal that feature selection techniques play a vital role in gene-expression data-based classification, and the proposed hybrid feature selection pipeline improves classification performance. Multi-class classification is possible using deep learning models, though further improvement particularly in late-stage classification is necessary and should be explored further.

Embedded Feature Selection Method Research Articles

Related Topics

Articles published on Embedded Feature Selection Method

Exploring key factors for long-term vessel incident risk prediction

Group-feature (Sensor) selection with controlled redundancy using neural networks

A Gaussian process embedded feature selection method based on automatic relevance determination

A Generalized Lightweight Intrusion Detection Model With Unified Feature Selection for Internet of Things Networks

Safe dynamic sparse training of modified RBF networks for joint feature selection and classification

Fuzzy Neighborhood-Based Manifold Learning and Feature Weight Matrix for Multilabel Feature Selection

A method to assist designers in optimizing the exterior styling of vehicles based on key features

A feature selection method for multimodal multispectral LiDAR sensing

Investigating the potential of EMA-embedded feature selection method for ESVR and LSTM to enhance the robustness of monthly streamflow forecasting from local meteorological information

A novel hybrid model combined with ensemble embedded feature selection method for estimating reference evapotranspiration in the North China Plain

Finding New VEGFR2 Inhibitors Using Support Vector Machine Classification Model

Discriminative multi-label feature selection with adaptive graph diffusion

An embedded feature selection method based on generalized classifier neural network for cancer classification

Feature selection under budget constraint in medical applications: analysis of penalized empirical risk minimization methods

GradWise: A Novel Application of a Rank-Based Weighted Hybrid Filter and Embedded Feature Selection Method for Glioma Grading with Clinical and Molecular Characteristics.

An embedded feature selection approach for depression classification using short text sequences

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.

Double groups of gates based Takagi-Sugeno-Kang (DG-TSK) fuzzy system for simultaneous feature selection and rule extraction

Automated diagnosis of Retinopathy of prematurity from retinal images of preterm infants using hybrid deep learning techniques

Research on the Prediction of Operator Users’ Number Portability Based on Community Detection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Embedded Feature Selection Method Research Articles

Related Topics

Articles published on Embedded Feature Selection Method

Exploring key factors for long-term vessel incident risk prediction

Group-feature (Sensor) selection with controlled redundancy using neural networks

A Gaussian process embedded feature selection method based on automatic relevance determination

A Generalized Lightweight Intrusion Detection Model With Unified Feature Selection for Internet of Things Networks

Safe dynamic sparse training of modified RBF networks for joint feature selection and classification

Fuzzy Neighborhood-Based Manifold Learning and Feature Weight Matrix for Multilabel Feature Selection

A method to assist designers in optimizing the exterior styling of vehicles based on key features

A feature selection method for multimodal multispectral LiDAR sensing

Investigating the potential of EMA-embedded feature selection method for ESVR and LSTM to enhance the robustness of monthly streamflow forecasting from local meteorological information

A novel hybrid model combined with ensemble embedded feature selection method for estimating reference evapotranspiration in the North China Plain

Finding New VEGFR2 Inhibitors Using Support Vector Machine Classification Model

Discriminative multi-label feature selection with adaptive graph diffusion

An embedded feature selection method based on generalized classifier neural network for cancer classification

Feature selection under budget constraint in medical applications: analysis of penalized empirical risk minimization methods

GradWise: A Novel Application of a Rank-Based Weighted Hybrid Filter and Embedded Feature Selection Method for Glioma Grading with Clinical and Molecular Characteristics.

An embedded feature selection approach for depression classification using short text sequences

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.

Double groups of gates based Takagi-Sugeno-Kang (DG-TSK) fuzzy system for simultaneous feature selection and rule extraction

Automated diagnosis of Retinopathy of prematurity from retinal images of preterm infants using hybrid deep learning techniques

Research on the Prediction of Operator Users’ Number Portability Based on Community Detection