Abstract

Microarray experiments are proficient of yielding observations for thousands of genes those are differentially expressed under several conditions. Although it is possible to measure simultaneously the changes in gene expression profiles at whole genomic scale, interpreting individual gene expression profile in terms of its actual biological function or associated biochemical processes remains challenging. Exploratory multivariate statistical techniques such as principal component analysis have been extensively used to reduce the complexity of large size microarray data. Although Saccaromycea Cerevisae is the most widely studied species using microarray techniques, a complete understanding of the efficacy of principal component analysis and data pre-processing is still lacking for clustering and functional mapping of yeast gene expression profiles, reported in various studies. Therefore in this work, we evaluate the impact of data pre-processing and principal component analysis on k-means clustering-based functional mapping of yeast gene expression profiles observed during diauxic-shift. Two time-series gene expression datasets were chosen such as, (1) yeast diauxic-shift data and (2) yeast sporulation data to examine the efficacy of principal component analysis in interpreting gene-based or score-based clusters and their relationship with known pathways. It was shown that unlike conventional pre-processing, principal component analysis provides a powerful tool to capture most of the information using only two component variables for inferring gene expression time-course data. Using yeast genome databases, it was demonstrated that clustering with principal components instead of the original variables does not necessarily improve the cluster quality but helps in identifying the relationships between genes of a cluster and key biological process of diauxic shift. Overall, the present analysis is useful in mining high dimensional microarray data at a reduced computational cost associated with functional enrichment of expression time-series, regardless of species or experimental conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call