Multi-subspace graph clustering joint dimensionality reduction and feature selection
Multi-subspace graph clustering joint dimensionality reduction and feature selection
- Research Article
13
- 10.1007/s10462-020-09889-4
- Aug 20, 2020
- Artificial Intelligence Review
In many pattern recognition applications feature selection and instance selection can be used as two data preprocessing methods that aim at reducing the computational cost of the learning process. Moreover, in some cases, feature subset selection can improve the classification performance. Feature selection and instance selection can be interesting since the choice of features and instances greatly influence the performance of the learnt models as well as their training costs. In the past, unifying both problems was carried out by solving a global optimization problem using meta-heuristics. This paradigm not only does not exploit the manifold structure of data but can be computationally expensive. To the best of our knowledge, the joint use of sparse modeling representative and feature subset relevance have not been exploited by the joint feature and selection methods. In this paper, we target the joint feature and instance selection by adopting feature subset relevance and sparse modeling representative selection. More precisely, we propose three schemes for the joint feature and instance selection. The first is a wrapper technique while the two remaining ones are filter approaches. In the filter approaches, the search process adopts a genetic algorithm in which the evaluation is mainly given by a score that quantify the goodness of the features and instances. An efficient instance selection technique is used and integrated in the search process in order to adapt the instances to the candidate feature subset. We evaluate the performance of the proposed schemes using image classification where classifiers are the nearest neighbor classifier and support vector machine classifier. The study is conducted on five public image datasets. These experiments show the superiority of the proposed schemes over various baselines. The results confirm that the filter approaches leads to promising improvement on classification accuracy when both feature selection and instance selection are adopted.
- Research Article
1
- 10.1007/s00500-022-07513-x
- Oct 13, 2022
- Soft Computing
Feature selection and instance selection are two data preprocessing methods widely used in data mining and pattern recognition. The main goal is to reduce the computational cost of many learning tasks. Recently, joint feature and instance selection has been approached by solving some global optimization problems using meta-heuristics. This approach is not only computationally expensive, but also does not exploit the fact that the data usually have a structured manifold implicitly hidden in the data and its labels. In this paper, we address joint feature and instance selection using scores derived from discriminant analysis theory. We present three approaches for joint feature and instance selection. The first scheme is a wrapper technique, while the other two schemes are filtering techniques. In the filtering approaches, the search process uses a genetic algorithm where the evaluation criterion is mainly given by the discriminant analysis score. This score depends simultaneously on the feature subset candidate and the best corresponding subset of instances. Thus, the best feature subset and the best instances are determined by finding the best score. The performance of the proposed approaches is quantified and studied using image classification with Nearest Neighbor and Support Vector Machine Classifiers. Experiments are conducted on five public image datasets. We compare the performance of our proposed methods with several state-of-the-art methods. The experiments performed show the superiority of the proposed methods over several baseline methods.
- Abstract
31
- 10.1186/1471-2105-14-s14-s16
- Oct 1, 2013
- BMC Bioinformatics
BackgroundIn drug discovery and development, it is crucial to determine which conformers (instances) of a given molecule are responsible for its observed biological activity and at the same time to recognize the most representative subset of features (molecular descriptors). Due to experimental difficulty in obtaining the bioactive conformers, computational approaches such as machine learning techniques are much needed. Multiple Instance Learning (MIL) is a machine learning method capable of tackling this type of problem. In the MIL framework, each instance is represented as a feature vector, which usually resides in a high-dimensional feature space. The high dimensionality may provide significant information for learning tasks, but at the same time it may also include a large number of irrelevant or redundant features that might negatively affect learning performance. Reducing the dimensionality of data will hence facilitate the classification task and improve the interpretability of the model.ResultsIn this work we propose a novel approach, named multiple instance learning via joint instance and feature selection. The iterative joint instance and feature selection is achieved using an instance-based feature mapping and 1-norm regularized optimization. The proposed approach was tested on four biological activity datasets.ConclusionsThe empirical results demonstrate that the selected instances (prototype conformers) and features (pharmacophore fingerprints) have competitive discriminative power and the convergence of the selection process is also fast.
- Research Article
- 10.1080/15435075.2024.2449155
- Jan 5, 2025
- International Journal of Green Energy
Accurate identification of industrial loads is crucial for user behavior analysis and power scheduling. To address the limitations of traditional feature selection methods in industrial load identification, this paper proposes a joint feature selection method based on Atomic Search Optimization (ASO) to enhance the accuracy and efficiency of load recognition. By selecting five typical industrial devices, a dataset of raw electrical measurement data was constructed. First, dual sampling was performed on the original power load data to extract time-domain features, frequency-domain features, and entropy features. Subsequently, ASO was utilized to filter the joint features of industrial loads. Finally, the features selected by ASO were input into Discriminant Analysis (DA), Decision Trees (DT), k-nearest Neighbors (KNN), Naive Bayes (NB), and Support Vector Machines (SVM) to verify the capability of ASO in selecting joint features for industrial load identification. The experimental results demonstrate that ASO can identify eight features most contributive to industrial load recognition from thirty joint features, achieving an average classification accuracy of 91.31% across the five classifiers. These findings provide a solid decision-making basis for smart management on the user side and intelligent scheduling on the grid side, showcasing the effectiveness and application potential of ASO in feature selection.
- Book Chapter
8
- 10.1007/978-3-642-38562-9_25
- Jan 1, 2013
Due to the absence of class labels, unsupervised feature selection is much more difficult than supervised feature selection. Traditional unsupervised feature selection algorithms usually select features to preserve the structure of the data set. Inspired from the recent developments on discriminative clustering, we propose in this paper a novel unsupervised feature selection approach via Joint Clustering and Feature Selection (JCFS). Specifically, we integrate Fisher score into the clustering framework. We select those features such that the fisher criterion is maximized and the manifold structure can be best preserved simultaneously. We also discover the connection between JCFS and other clustering and feature selection methods, such as discriminative K-means, JELSR and DCS. Experimental results on real world data sets demonstrated the effectiveness of the proposed algorithm.KeywordsUnsupervised Feature SelectionFisher ScoreSpectral Clustering
- Research Article
52
- 10.1162/evco_a_00102
- Aug 8, 2013
- Evolutionary Computation
Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly produced in many fields of research. At the same time, most of the recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly harms classification or recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is largely improved when the complexity of the data is reduced. One of the approaches to address problems that have too many features or instances is feature or instance selection, respectively. Although most methods address instance and feature selection separately, both problems are interwoven, and benefits are expected from facing these two tasks jointly. This paper proposes a new memetic algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection. The proposed method performs four different local search procedures with the aim of obtaining the most relevant subsets of instances and features to perform an accurate classification. A new fitness function is also proposed that enforces instance selection but avoids putting too much pressure on removing features. We prove experimentally that this fitness function improves the results in terms of testing error. Regarding the scalability of the method, an extension of the stratification approach is developed for simultaneous instance and feature selection. This extension allows the application of the proposed algorithm to large datasets. An extensive comparison using 55 medium to large datasets from the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 30 large problems, with very good results. The accuracy of the method for class-imbalanced problems in a set of 40 datasets is shown. The usefulness of the method is also tested using decision trees and support vector machines as classification methods.
- Research Article
82
- 10.1016/j.knosys.2020.106020
- May 18, 2020
- Knowledge-Based Systems
Joint imbalanced classification and feature selection for hospital readmissions
- Research Article
97
- 10.1109/lgrs.2015.2506570
- Feb 1, 2016
- IEEE Geoscience and Remote Sensing Letters
Selecting discriminate features and constructing an appropriate classifier are two essential factors for ship classification in a synthetic aperture radar (SAR) image. Unfortunately, these two factors are rarely considered together by existing studies. We propose a joint feature and classifier selection method by integrating the classifier selection strategy into a wrapper feature selection framework. The sequential forward floating searching algorithm is improved to conduct efficient searching for an optimal triplet of feature-scaling-classifier. Comprehensive experiments on two data sets demonstrate that the proposed method can select the optimal combination of a nonredundant complementary feature subset, appropriate scaling, and classifier to improve the performance of ship classification in a SAR image.
- Conference Article
17
- 10.1109/icsmc.2011.6083921
- Oct 1, 2011
This work presents a method for improving classifier accuracy through joint feature selection and hierarchical classifier design with genetic algorithms. The hierarchical classifier divides the classification problem into a set of smaller problems using multiple feature-selected classifiers in a tree configuration to separate the data into progressively smaller groups of classes. This allows the use of more specific feature sets for each set of classes. Several existing performance measures for evaluating the feature sets are investigated, and a new measure, count-based RELIEF is proposed. The joint feature selection and hierarchical classifier design method is tested on two artificial data sets. Results indicate that the feature selected hierarchical classifiers are able to achieve better accuracy than a non-hierarchical classifier using feature selection alone. The newly proposed performance measure is also tested and shown to provide a better indication of classifier performance than existing methods.
- Book Chapter
2
- 10.1007/978-3-030-42128-1_4
- Jan 1, 2020
Feature selection is a fundamental problem in learning. We are immersed in a huge quantity of spatial and temporal data, and one of the crucial questions if we want to learn efficiently is to find the key cues that are correlated with our specific learning task. Often the task itself is not supervised, that is, we do not know exactly what we are looking for. In that case, we turn again our attention towards the natural clustering and correlations that take place in the spatiotemporal world. In this chapter, we present an efficient method that performs joint classifier learning and feature selection. The approach is able to discover sparse, compact representations of input features from a vast sea of candidates, with an almost unsupervised formulation. For the main algorithm to work, we require only the following knowledge, that is to know, for each cue, whether or not a particular feature has on average stronger values over positive samples than over negatives. We call this bit of knowledge the feature sign. What is interesting is that the mathematical formulation of the problem follows directly from the clustering approach from Chap. 3, which is in turn related to the initial graph matching formulation from Chap. 2. The main feature selection idea boils down to discovering the cluster of features that fire together and are strongly correlated. The spikes in strongly intercorrelated group firings are often robust indicators of the presence of the positive class. In experiments, we show that discovering the correct set of relevant features can be done using as few as a single-labeled training sample per class—used to estimate the feature signs. Then, using these feature signs, we extend an initial supervised learning problem into an unsupervised learning formulation that can incorporate new data without requiring ground truth labels. Thus our method works, simultaneously, both as a feature selection mechanism and as a fully competitive classifier. Our original algorithm has certain theoretical guarantees, low computational cost, and excellent accuracy, especially in difficult cases of very limited training data. Its practical value is demonstrated on large-scale recognition experiments in video: it outperforms in speed and accuracy established feature selection approaches such as AdaBoost, Lasso, greedy forward-backward selection, and powerful classifiers such as SVM, especially in the case of limited supervised training data.
- Research Article
- 10.1109/tcbbio.2025.3605691
- Jan 1, 2025
- IEEE transactions on computational biology and bioinformatics
Sparse Partial Least Squares (sPLS) is a common dimensionality reduction technique for data fusion, which projects data samples from two views by seeking linear combinations with a small number of variables with the maximum variance. However, sPLS extracts the combinations between two data sets with all data samples so that it cannot detect latent subsets of samples. To extend the application of sPLS by identifying a specific subset of samples and remove outliers, we propose an $\ell _\infty /\ell _{0}$-norm constrained weighted sparse PLS ($\ell _\infty /\ell _{0}$-wsPLS) method for joint sample and feature selection, where the $\ell _\infty /\ell _{0}$-norm constrains are used to select a subset of samples. We prove that the $\ell _\infty /\ell _{0}$-norm constrains have the Kurdyka-Łojasiewicz property so that a globally convergent algorithm is developed to solve it. Moreover, multi-view data with a same set of samples can be available in various real problems. To this end, we extend the $\ell _\infty /\ell _{0}$-wsPLS model and propose two multi-view wsPLS models for multi-view data fusion. We develop an efficient iterative algorithm for each multi-view wsPLS model and show its convergence property. As well as numerical and biomedical data experiments demonstrate the efficiency of the proposed methods.
- Conference Article
54
- 10.1145/3377930.3389815
- Jun 25, 2020
Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield better model interpretability and lower cost of data acquisition, data handling and model inference. While sparsity may have a beneficial or detrimental effect on predictive performance, a small drop in performance may be acceptable in return for a substantial gain in sparseness. We therefore treat feature selection as a multi-objective optimization task. We perform hyperparameter tuning and feature selection simultaneously because the choice of features of a model may influence what hyperparameters perform well. We present, benchmark, and compare two different approaches for multi-objective joint hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization. The second is an evolutionary NSGA-II-based wrapper approach to feature selection which incorporates specialized sampling, mutation and recombination operators. Both methods make use of parameterized filter ensembles. While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs computational overhead compared to the NSGA-II, so the preferred choice depends on the cost of evaluating a model on given data.
- Research Article
16
- 10.1016/j.knosys.2019.104915
- Aug 5, 2019
- Knowledge-Based Systems
Joint sample and feature selection via sparse primal and dual LSSVM
- Conference Article
8
- 10.1109/bibm.2011.82
- Nov 1, 2011
There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Joint Rule Extraction and Feature Selection (JRF), based on 1-norm regularized random forests. JRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied JRF to several drug activity prediction and micro array data sets. JRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
- Research Article
27
- 10.1109/tcsvt.2021.3073937
- Apr 20, 2021
- IEEE Transactions on Circuits and Systems for Video Technology
Domain adaptation aims to exploit domain-invariant features by aligning the cross-domain distributions in the manifold subspace for applying the classifier trained on the source domain to the target domain. However, two limitations may still deteriorate their performances: (1) the influences of noisy or irrelevant features in the original feature space are ignored, which may unexpectedly hurt the classification of target samples; (2) the graph constructed directly in the original data space cannot accurately capture the inherent local manifold structures of high-dimensional data due to the curse of dimensionality, which may seriously mislead the transferable features learning. In this paper, we propose a novel approach to address these problems, referred to as joint Adaptive Dual Graph and Feature Selection for domain adaptation (ADGFS). Specifically, feature selection can characterize the relative importance of different features through a scaling factor, which enables ADGFS to not only reduce the impacts of noisy or irrelevant features on knowledge transfer but also learn informative domain-invariant features. Meanwhile, ADGFS adaptively optimizes the dual graph by learning the similarity matrices of both instance-level and feature-level graphs in the projected low-dimensional manifold subspace rather than the original high-dimensional space, such that the intrinsic local manifold structures of data can be captured precisely. Moreover, ADGFS simultaneously aligns the marginal and conditional probability distributions in the nonnegative matrix factorization framework to narrow the distribution discrepancies between the two different domains, which can adequately transfer knowledge from the source domain to the target domain. Comprehensive experiments on four benchmark datasets can demonstrate that the effectiveness of the proposed approach in cross-domain image classification.