Abstract
Abstract Background: Manifold Learning (ML) has become essential in recent years for expression data visualization and dimension reduction. ML operates under the assumption that the data points lie on a manifold embedded in Euclidean space. The goal of ML is to learn an inverse embedding function to decipher the manifold structure. Cancer classification is a widely investigated application of expression profile data (e.g., miRNA) currently in the literature. The non-linear ML techniques commonly applied to expression data, such as t-distributed Stochastic Neighborhood Embedding (TSNE), Isomap, and Locally Linear Embedding (LLE), are intended for visualization, and cannot easily be applied to classification or unseen data, more generally. Such methods are also unsupervised, and do not factor in class information. Conventional linear ML methods, such as Principal Component Analysis (PCA), while applicable to unseen data and easy to use, are limited and can learn only linear manifolds. Results: We introduce a novel, non-linear ML technique, which incorporates class memberships, and is readily applicable to unseen data and classification. Specifically, we construct a series of neighborhood graphs which describe the manifold structure locally within each cluster and among the cluster centroids. Our objective function uses the neighborhood graph information to preserve the class separations and manifold structure in reduced dimension space, and we train a neural network to learn an explicit, inverse embedding function. This allows for fast visualization and classification of unseen data. The technique is compared against similar methods from the literature which use neural networks for ML, and is shown to offer improved cancer classification performance on expression data from The Cancer Genome Atlas (TCGA) and The Cancer Proteome Atlas (TCPA). Conclusion: We present a new, supervised ML technique designed specifically for classification, which can be efficiently applied to unseen data. The results show promise on multiple, large-scale expression data sets, and thus warrant further research into supervised ML methods for cancer classification and expression data visualization. Citation Format: James Webber, Kevin Elias. Supervised manifold learning and classification; application to expression data visualization and cancer prediction. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5372.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have