Spectral embedding finds meaningful (relevant) structure in image and microarray data

Brandon W Higgs,Jennifer Weller,Jeffrey L Solka

doi:10.1186/1471-2105-7-74

Brandon W Higgs, Jennifer Weller + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-7-74

Copy DOI

Abstract

BackgroundAccurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables. Principal components analysis (PCA) is a linear dimensionality reduction (DR) method that is unsupervised in that it relies only on the data; projections are calculated in Euclidean or a similar linear space and do not use tuning parameters for optimizing the fit to the data. However, relationships within sets of nonlinear data types, such as biological networks or images, are frequently mis-rendered into a low dimensional space by linear methods. Nonlinear methods, in contrast, attempt to model important aspects of the underlying data structure, often requiring parameter(s) fitting to the data type of interest. In many cases, the optimal parameter values vary when different classification algorithms are applied on the same rendered subspace, making the results of such methods highly dependent upon the type of classifier implemented.ResultsWe present the results of applying the spectral method of Lafon, a nonlinear DR method based on the weighted graph Laplacian, that minimizes the requirements for such parameter optimization for two biological data types. We demonstrate that it is successful in determining implicit ordering of brain slice image data and in classifying separate species in microarray data, as compared to two conventional linear methods and three nonlinear methods (one of which is an alternative spectral method). This spectral implementation is shown to provide more meaningful information, by preserving important relationships, than the methods of DR presented for comparison.Tuning parameter fitting is simple and is a general, rather than data type or experiment specific approach, for the two datasets analyzed here. Tuning parameter optimization is minimized in the DR step to each subsequent classification method, enabling the possibility of valid cross-experiment comparisons.ConclusionResults from the spectral method presented here exhibit the desirable properties of preserving meaningful nonlinear relationships in lower dimensional space and requiring minimal parameter fitting, providing a useful algorithm for purposes of visualization and classification across diverse datasets, a common challenge in systems biology.

Highlights

Accurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables
Tuning parameter optimization is minimized in the dimensionality reduction (DR) step to each subsequent classification method, enabling the possibility of valid cross-experiment comparisons
We examined the performance of a spectral method presented by Lafon [3,4] and have shown that it is successful in extracting meaningful structure in these two disparate data types, both having high dimensionality paired with low replication, with a method for calculating the tuning parameter that does not have to be varied across classifiers to achieve correct results

Summary

Introduction

Accurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables. Where local data structures are not best summarized linearly (yet important to the interpretation of the experimental results), nonlinear methods that are kernel-based (e.g. kernel PCA) [6] and graph theoretic like spectral embedding [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17] can be more appropriate These methods attempt to model the underlying manifold by fitting a kernel parameter to optimize performance (e.g. as assessed by some performance accuracy metric) [6]. Such parameter(s) modifications are optimized with a specific range of values that can be different for each classifier

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 16, 2006
Citations: 39	License type: cc-by

R Discovery Prime

R Discovery Prime

Spectral embedding finds meaningful (relevant) structure in image and microarray data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A critical study of different dimensionality reduction methods for gear crack degradation assessment under different operating conditions
Xiang Wan ... Qing Zhang
Measurement | VOL. 78
Xiang Wan, et. al.Xiang Wan ... Qing Zhang
22 Oct 2015
Measurement | VOL. 78

Comparative study of different dimensionality reduction methods in hyperspectral image classification
Lei Kang ... Xiaoqing Hu
Journal of Physics: Conference Series | VOL. 2024
Lei Kang, et. al.Lei Kang ... Xiaoqing Hu
01 Sep 2021
Journal of Physics: Conference Series | VOL. 2024

Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies
G Lee ... A Madabhushi
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 5
G Lee, et. al.G Lee ... A Madabhushi
01 Jul 2008
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 5

An Orthogonal Locality and Globality Dimensionality Reduction Method Based on Twin Eigen Decomposition
Shuzhi Su ... Gang Zhu
IEEE Access | VOL. 9
Shuzhi Su, et. al.Shuzhi Su ... Gang Zhu
01 Jan 2020
IEEE Access | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spectral embedding finds meaningful (relevant) structure in image and microarray data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics