Semi-Supervised Dimensionality Reduction by Linear Compression and Stretching

Zhiguo Long,Hua Meng,Michael Sioutis

doi:10.1109/access.2020.2971562

Abstract

Dimensionality reduction is a fundamental and important research topic in the field of machine learning. This paper focuses on a dimensionality reduction technique that exploits semi-supervising information in the form of pairwise constraints; specifically, these constraints specify whether two instances belong to the same class or not. We propose two dual linear methods to accomplish dimensionality reduction under that setting. These two methods overcome the difficulty of maximizing between-class difference and minimizing within-class difference at the same time, by transforming the original data into a new space in such a way that the bi-objective problem is (almost) equivalently reduced to a single objective problem. Empirical evaluations on a broad range of public datasets show that the two proposed methods are superior to several existing methods for semi-supervised dimensionality reduction.

Highlights

High-dimensional data are common in various machine learning applications, from text document and image processing [1]–[3] to biological data analysis [4], [5]
Because of the curse of dimensionality [6], dimensionality reduction methods are fundamental to the success of many machine learning algorithms
Representative pairwise constraintbased linear methods include: constraints based Fisher Linear Discriminant [16], which is similar to FLD or SDA, but where the within-class difference is obtained by checking must-link constraints; SemiSupervised Dimensionality Reduction (SSDR) [17], which considers differences of both must-link and cannot-link constraints

Summary

INTRODUCTION

High-dimensional data are common in various machine learning applications, from text document and image processing [1]–[3] to biological data analysis [4], [5]. Of semi-supervising information, including pairwise constraints and incomplete class labels. Based on the type of semi-supervising information, methods for dimensionality reduction can be categorised as label-based, pairwise constraint-based, and other types. Representative pairwise constraintbased linear methods include: constraints based Fisher Linear Discriminant (cFLD) [16], which is similar to FLD or SDA, but where the within-class difference is obtained by checking must-link constraints (i.e., pairs of instances that are known to belong to the same class or have the same label); SemiSupervised Dimensionality Reduction (SSDR) [17], which considers differences of both must-link and cannot-link constraints SSDR considers αJB − βJW as the objective to maximize, which can be seen as obtaining a balance between the two objectives of maximizing JB and minimizing JW

CONTRIBUTION

PRELIMINARIES

DECREASING WITHIN-CLASS DIFFERENCES

RESULTS ON YALEB AND MNIST DATASETS

CONCLUSION