Abstract
BackgroundLife processes are determined by the organism's genetic profile and multiple environmental variables. However the interaction between these factors is inherently non-linear [1]. Microarray data is one representation of the nonlinear interactions among genes and genes and environmental factors. Still most microarray studies use linear methods for the interpretation of nonlinear data. In this study, we apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets.ResultsIsomap discovered low-dimensional structures embedded in the Affymetrix microarray data sets. These structures correspond to and help to interpret biological phenomena present in the data. This analysis provides examples of temporal, spatial, and functional processes revealed by the Isomap algorithm. In a spinal cord injury data set, Isomap discovers the three main modalities of the experiment – location and severity of the injury and the time elapsed after the injury. In a multiple tissue data set, Isomap discovers a low-dimensional structure that corresponds to anatomical locations of the source tissues. This model is capable of describing low- and high-resolution differences in the same model, such as kidney-vs.-brain and differences between the nuclei of the amygdala, respectively. In a high-throughput drug screening data set, Isomap discovers the monocytic and granulocytic differentiation of myeloid cells and maps several chemical compounds on the two-dimensional model.ConclusionVisualization of Isomap models provides useful tools for exploratory analysis of microarray data sets. In most instances, Isomap models explain more of the variance present in the microarray data than PCA or MDS. Finally, Isomap is a promising new algorithm for class discovery and class prediction in high-density oligonucleotide data sets.
Highlights
Life processes are determined by the organism's genetic profile and multiple environmental variables
The DiGiovanni et al study used a stringent inclusion threshold that included only those genes present in at least 40% of all the samples. In addition to this first filter, a second filter was applied that eliminated genes that did not have a change in expression level of at least two-fold compared to that of the sham controls, which was determined with Welch ANOVA t-test
We do not use stringent selection criteria at the level of data scrubbing, because these filters may introduce confounding into the data set, which can lead to separation between sample classes based on subjective filtering criteria rather than existing biological phenomena
Summary
Life processes are determined by the organism's genetic profile and multiple environmental variables. We apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets. With the widespread use of microarrays in basic research and their increasing use in medical diagnostics, biomedical researchers can anticipate lower costs for chips that will lead to more studies utilizing hundreds, if not thousands, of samples. This expansion in sample size will provide researchers with higher resolution insights into biological processes as they are reflected in temporal, spatial, and functional patterns in microarray data sets. Several types of pattern recognition and clustering techniques have been developed and applied to microarray data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.