Fuzzy neighborhood components analysis: Supervised dimensionality reduction under uncertain labels

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Fuzzy neighborhood components analysis: Supervised dimensionality reduction under uncertain labels

Similar Papers
  • Research Article
  • Cite Count Icon 29
  • 10.1109/tpami.2012.20
Proximity-Based Frameworks for Generating Embeddings from Multi-Output Data
  • Nov 1, 2012
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Tingting Mu + 3 more

This paper is about supervised and semi-supervised dimensionality reduction (DR) by generating spectral embeddings from multi-output data based on the pairwise proximity information. Two flexible and generic frameworks are proposed to achieve supervised DR (SDR) for multilabel classification. One is able to extend any existing single-label SDR to multilabel via sample duplication, referred to as MESD. The other is a multilabel design framework that tackles the SDR problem by computing weight (proximity) matrices based on simultaneous feature and label information, referred to as MOPE, as a generalization of many current techniques. A diverse set of different schemes for label-based proximity calculation, as well as a mechanism for combining label-based and feature-based weight information by considering information importance and prioritization, are proposed for MOPE. Additionally, we summarize many current spectral methods for unsupervised DR (UDR), single/multilabel SDR, and semi-supervised DR (SSDR) and express them under a common template representation as a general guide to researchers in the field. We also propose a general framework for achieving SSDR by combining existing SDR and UDR models, and also a procedure of reducing the computational cost via learning with a target set of relation features. The effectiveness of our proposed methodologies is demonstrated with experiments with document collections for multilabel text categorization from the natural language processing domain.

  • Conference Article
  • Cite Count Icon 2
  • 10.1145/3287921.3287925
Reducing Class Overlapping in Supervised Dimension Reduction
  • Jan 1, 2018
  • Nguyen Trong Tung + 3 more

Dimension reduction is to find a low-dimensional subspace to project high-dimensional data on, such that the discriminative property of the original higher-dimensional data is preserved. In supervised dimension reduction, class labels are integrated into the lower-dimensional representation, to produce better results on classification tasks. The supervised dimension reduction (SDR) framework by [17] is one of the state-of-the-art methods that takes into account not only the class labels but also the neighborhood graphs of the data, and have some advantages in preserving the within-class local structure and widening the between-class margin. However, the reduced-dimensional representation produced by the SDR framework suffers from the class overlapping problem - in which, data points lie closer to a different class rather than the class they belong to. The class overlapping problem can hurt the quality on the classification task. In this paper, we propose a new method to reduce the overlap for the SDR framework in [17]. The experimental results show that our method reduces the size of the overlapping set by an order of magnitude. As a result, our method outperforms the pre-existing framework on the classification task significantly. Moreover, visualization plots show that the reduced-dimensional representation learned by our method is more scattered for within-class data and more separated for between-class data, as compared to the pre-existing SDR framework.

  • Research Article
  • Cite Count Icon 51
  • 10.1016/j.patcog.2014.12.001
Two-stage multiple kernel learning for supervised dimensionality reduction
  • Dec 13, 2014
  • Pattern Recognition
  • Abdollah Nazarpour + 1 more

Two-stage multiple kernel learning for supervised dimensionality reduction

  • Book Chapter
  • Cite Count Icon 18
  • 10.1007/978-3-319-93647-5_6
Emotion Recognition Using Neighborhood Components Analysis and ECG/HRV-Based Features
  • Jan 1, 2018
  • Hany Ferdinando + 2 more

Previous research showed that supervised dimensionality reduction using Neighborhood Components Analysis (NCA) enhanced the performance of 3-class problem emotion recognition using ECG only where features were the statistical distribution of dominant frequencies and the first differences after applying bivariate empirical mode decomposition (BEMD). This paper explores how much NCA enhances emotion recognition using ECG-derived features, esp. standard HRV features with two difference normalization methods and statistical distribution of instantaneous frequencies and the first differences calculated using Hilbert-Huang Transform (HHT) after empirical mode decomposition (EMD) and BEMD. Results with the MAHNOB-HCI database were validated using subject-dependent and subject-independent scenarios with kNN as classifier for 3-class problem in valence and arousal. A t-test was used to assess the results with significance level 0.05. Results show that NCA enhances the performance up to 74% from the implementation without NCA with p-values close to zero in most cases. Different feature extraction methods offered different performance levels in the baseline but the NCA enhanced them such that the performances were close to each other. In most experiments use of combined standardized and normalized HRV-based features improved performance. Using NCA on this database improved the standard deviation significantly for HRV-based features under subject-independent scenario.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/3107411.3108225
Integrative Sufficient Dimension Reduction Methods for Multi-Omics Data Analysis
  • Aug 20, 2017
  • Yashita Jain + 1 more

With the advent of high throughput genome-wide assays it has become possible to simultaneously measure multiple types of genomic data. Several projects like TCGA, ICGC, NCI-60 has generated comprehensive, multi-dimensional maps of the key genomic changes like MiRNA, MRNA, proteomics etc. from cancer samples[2,4]. These genomic data can be used for classifying tumour types[5]. Integrative analysis of these data from multiple sources can potentially provide additional biological insights, but methods to do any such analysis are lacking. One of the widely used solutions to handle high dimension data is by removing redundant information in the integrated sample. Most of the expressed genes are overlapped and can be projected onto lower dimension, and then be used to classify different tumor types, without the loss of any/much information. Sufficient dimension reduction (SDR) [1], a supervised dimension reduction approach, can be ideal to achieve such a goal. In this paper, we propose a novel integrative SDR method that can reduce dimensions of multiple data types simultaneously while sharing common latent structures to improve prediction and interpretation. In particular, we extend the sliced inverse regression (SIR) technique, a major SDR method, to integrate multiple omits data for simultaneous dimension reduction. SIR is a supervised dimension reduction method that assumes that the outcome variable Y depends on the predictor variable X through d unknown linear combinations of the predictor[3]. The predictor variable is replaced by its projection into a lower dimension subspace of the predictor space without the loss of information. The aim is to find the intersection of all the subspaces δ called the central susbspace (CS) of the predictor space satisfying the property Y ╨ X| Pδ X. To integrate multiple types of data, we propose and implement a new integrative sufficient dimension reduction method extending SIR[3], called integrative SIR. The main idea is that we take into account all the multi-omics data information simultaneously while finding a basis matrix for each data type with some sharing latent structures. Finally, we get d dimension data which is much smaller than the original data dimension. The reduced dimension d was achieved by cross validation. To demonstrate the integrated analysis of multi-omics data, we applied and compared conventional SIR and integrative SIR to analyze MRNA, MiRNA and proteomics expression profile of a subset of cell lines from the NCI-60 panel. The data used is taken from [6]. The outcomes we have to classify are CNS, Leukemia and Melanoma tumor types. We pre-screened 400 variables from each data type with the criteria of high variance. To find classification error, we performed random forest classification after we applied to each method with leave-one-out cross-validation. As a result, we found out that integrative SIR leads to less classification error as compared to conventional SIR. To summarize, we proposed a new integrative SIR method, a supervised dimension reduction technique for integrative analysis of multi-omics data types. Unlike conventional SDR methods, the new approach can reduce the dimensions of multiple omics data simultaneously while sharing common latent structures across data types without losing any information in prediction. By efficiently capturing the common information, our numerical study shows that integrative SIR classifies tumor types more accurately as compared to conventional SDR methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 117
  • 10.3390/make1010020
Recent Advances in Supervised Dimension Reduction: A Survey
  • Jan 7, 2019
  • Machine Learning and Knowledge Extraction
  • Guoqing Chao + 2 more

Recently, we have witnessed an explosive growth in both the quantity and dimension of data generated, which aggravates the high dimensionality challenge in tasks such as predictive modeling and decision support. Up to now, a large amount of unsupervised dimension reduction methods have been proposed and studied. However, there is no specific review focusing on the supervised dimension reduction problem. Most studies performed classification or regression after unsupervised dimension reduction methods. However, we recognize the following advantages if learning the low-dimensional representation and the classification/regression model simultaneously: high accuracy and effective representation. Considering classification or regression as being the main goal of dimension reduction, the purpose of this paper is to summarize and organize the current developments in the field into three main classes: PCA-based, Non-negative Matrix Factorization (NMF)-based, and manifold-based supervised dimension reduction methods, as well as provide elaborated discussions on their advantages and disadvantages. Moreover, we outline a dozen open problems that can be further explored to advance the development of this topic.

  • Research Article
  • Cite Count Icon 17
  • 10.1080/02331888.2013.800067
Supervised invariant coordinate selection
  • May 30, 2013
  • Statistics
  • Eero Liski + 2 more

Dimension reduction plays an important role in high-dimensional data analysis. Principal component analysis, independent component analysis, and sliced inverse regression (SIR) are well known but very different analysis tools for the dimension reduction. It appears that these three approaches can all be seen as a comparison of two different scatter matrices S1 and S2. The components for dimension reduction are then given by the eigenvectors of . In SIR, the second scatter matrix is supervised and therefore the choice of the components is based on the dependence between the observed random vector and a response variable. Based on these notions, we extend the invariant coordinate selection (ICS), allowing the second scatter matrix S2 to be supervised; supervised ICS can then be used in supervised dimension reduction. It is remarkable that many supervised dimension reduction methods proposed in the literature such as the linear discriminant analysis, canonical correlation analysis, SIR, sliced average variance estimate, directional regression, and principal Hessian directions can be reformulated in this way. Several families of supervised scatter matrices are discussed, and their use in supervised dimension reduction is illustrated with a real data example and simulations.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1214/10-ejs572
Sparse supervised dimension reduction in high dimensional classification
  • Jan 1, 2010
  • Electronic Journal of Statistics
  • Junhui Wang + 1 more

Supervised dimension reduction has proven effective in analyzing data with complex structure. The primary goal is to seek the reduced subspace of minimal dimension which is sufficient for summarizing the data structure of interest. This paper investigates the supervised dimension reduction in high dimensional classification context, and proposes a novel method for estimating the dimension reduction subspace while retaining the ideal classification boundary based on the original dataset. The proposed method combines the techniques of margin based classification and shrinkage estimation, and can estimate the dimension and the directions of the reduced subspace simultaneously. Both theoretical and numerical results indicate that the proposed method is highly competitive against its competitors, especially when the dimension of the covariates exceeds the sample size.

  • Research Article
  • Cite Count Icon 36
  • 10.1214/18-ejs1403
Supervised dimensionality reduction via distance correlation maximization
  • Jan 1, 2018
  • Electronic Journal of Statistics
  • Praneeth Vepakomma + 2 more

In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, (Szekely et al., 2007). We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation $\mathbf{z}$, which maximizes the squared sum of Distance Correlations between low-dimensional features $\mathbf{z}$ and response $y$, and also between features $\mathbf{z}$ and covariates $\mathbf{x}$. We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximization method of (Parizi et al., 2015). We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.

  • Book Chapter
  • Cite Count Icon 37
  • 10.1007/978-3-642-15711-0_83
Semi Supervised Multi Kernel (SeSMiK) Graph Embedding: Identifying Aggressive Prostate Cancer via Magnetic Resonance Imaging and Spectroscopy
  • Jan 1, 2010
  • Pallavi Tiwari + 3 more

With the wide array of multi scale, multi-modal data now available for disease characterization, the major challenge in integrated disease diagnostics is to able to represent the different data streams in a common framework while overcoming differences in scale and dimensionality. This common knowledge representation framework is an important pre-requisite to develop integrated meta-classifiers for disease classification. In this paper, we present a unified data fusion framework, Semi Supervised Multi Kernel Graph Embedding (SeSMiK-GE). Our method allows for representation of individual data modalities via a combined multi-kernel framework followed by semi- supervised dimensionality reduction, where partial label information is incorporated to embed high dimensional data in a reduced space. In this work we evaluate SeSMiK-GE for distinguishing (a) benign from cancerous (CaP) areas, and (b) aggressive high-grade prostate cancer from indolent low-grade by integrating information from 1.5 Tesla in vivo Magnetic Resonance Imaging (anatomic) and Spectroscopy (metabolic). Comparing SeSMiK-GE with unimodal T2w, MRS classifiers and a previous published non-linear dimensionality reduction driven combination scheme (ScEPTre) yielded classification accuracies of (a) 91.3% (SeSMiK), 66.1% (MRI), 82.6% (MRS) and 86.8% (ScEPTre) for distinguishing benign from CaP regions, and (b) 87.5% (SeSMiK), 79.8% (MRI), 83.7% (MRS) and 83.9% (ScEPTre) for distinguishing high and low grade CaP over a total of 19 multi-modal MRI patient studies.

  • Research Article
  • Cite Count Icon 110
  • 10.1016/j.measurement.2015.11.047
Bearing remaining useful life estimation based on time–frequency representation and supervised dimensionality reduction
  • Feb 27, 2016
  • Measurement
  • Minghang Zhao + 2 more

Bearing remaining useful life estimation based on time–frequency representation and supervised dimensionality reduction

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.neucom.2016.09.048
Moments discriminant analysis for supervised dimensionality reduction
  • Sep 30, 2016
  • Neurocomputing
  • K Ramachandra Murthy + 1 more

Moments discriminant analysis for supervised dimensionality reduction

  • Single Report
  • Cite Count Icon 121
  • 10.21236/ada439511
Concept Indexing: A Fast Dimensionality Reduction Algorithm With Applications to Document Retrieval and Categorization
  • Mar 6, 2000
  • George Karypis + 1 more

: In recent years, we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. This has led to an increased interest in developing methods that can efficiently categorize and retrieve relevant information. Retrieval techniques based on dimensionality reduction, such as Latent Semantic Indexing (LSI), have been shown to improve the quality of the information being retrieved by capturing the latent meaning of the words present in the documents. Unfortunately, the high computational requirements of LSI and its inability to compute an effective dimensionality reduction in a supervised setting limits its applicability. In this paper we present a fast dimensionality reduction algorithm, called concept indexing (CI) that is equally effective for unsupervised and supervised dimensionality reduction. CI computes a k-dimensional representation of a collection of documents by first clustering the documents into k groups, and then using the centroid vectors of the clusters to derive the axes of the reduced k-dimensional space. Experimental results show that the dimensionality reduction computed by CI achieves comparable retrieval performance to that obtained using LSI, while requiring an order of magnitude less time. Moreover, when CI is used to compute the dimensionality reduction in a supervised setting, it greatly improves the performance of traditional classification algorithms such as C4.5 and kNN.

  • Conference Article
  • 10.1109/grc.2009.5255018
Orthogonal subspace based nonlinear correlation learning for supervised dimensionality reduction
  • Aug 1, 2009
  • Zhao Zhang + 3 more

Many problems in intelligent data analysis involve some forms of dimensionality reduction. The paper discusses a new supervised dimensionality reduction method where samples are accompanied with class labels. We also show that it can be easily extended to the non-linear dimensionality reduction scenarios by the kernel tricks, and then we proposes an effective orthogonal feature subspace and correlation learning based non-linear dimensionality reduction called OSNCL, which is a way of measuring the nonlinear relationships between two multidimensional datasets and aims to find two sets of orthogonal bases, one for each dataset. In this setting, pairwise constraints are adopted to specify whether the pairs of instances belong to the same class or not. OSNCL can project the multivariate data into a set of more useful features and preserve the intrinsic structure of the data and the pairwise constraints defined in the orthogonal feature subspaces, under which the projections of the data are easier to be effectively partitioned from each other. We also demonstrate the practical usefulness and high scalability of OSNCL method in many data visualization tasks and experimental results on a broad range of datasets show that OSNCL method is superior to many established dimensionality reduction methods. After dimensions of the samples are reduced, few of the clusters with different class labels lying in the orthogonal subspaces constructed by OSNCL are mixed with each other.

  • Research Article
  • 10.3390/math14020325
RMFGP: A Rotated Multi-Fidelity Gaussian Process Framework for Supervised Dimension Reduction
  • Jan 18, 2026
  • Mathematics
  • Jiahao Zhang + 2 more

High-dimensional surrogate modeling with limited high-fidelity data poses a major challenge in uncertainty quantification. Classical supervised dimension reduction methods often fail in this setting due to insufficient accurate observations, while low-fidelity data are abundant but biased. In this work, we propose a Rotated Multi-Fidelity Gaussian Process (RMFGP) framework that enables reliable dimension reduction and surrogate construction under severe data scarcity. The proposed method integrates nonlinear multi-fidelity Gaussian process regression with sliced average variance estimation (SAVE) to iteratively identify informative input directions. Low-fidelity data are first used to extract coarse structural information, which is exploited to rotate the input space prior to multi-fidelity model training. Predictions generated by the trained RMFGP surrogate are then used to refine the dimension reduction, allowing accurate estimation of the central sufficient dimension reduction subspace even when high-fidelity data are scarce. A Bayesian active learning strategy based on predictive uncertainty is further incorporated to adaptively select new high-fidelity samples. Numerical examples, including stochastic partial differential equations, demonstrate that RMFGP significantly improves prediction accuracy, convergence, and uncertainty propagation compared to existing Gaussian process-based dimension reduction approaches, while requiring substantially fewer high-fidelity evaluations.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant