Abstract
Analyzing data obtained from genome-wide gene expression experiments is challenging due to the quantity of variables, the need for multivariate analyses, and the demands of managing large amounts of data. Here we present the R package pcaGoPromoter, which facilitates the interpretation of genome-wide expression data and overcomes the aforementioned problems. In the first step, principal component analysis (PCA) is applied to survey any differences between experiments and possible groupings. The next step is the interpretation of the principal components with respect to both biological function and regulation by predicted transcription factor binding sites. The robustness of the results is evaluated using cross-validation, and illustrative plots of PCA scores and gene ontology terms are available. pcaGoPromoter works with any platform that uses gene symbols or Entrez IDs as probe identifiers. In addition, support for several popular Affymetrix GeneChip platforms is provided. To illustrate the features of the pcaGoPromoter package a serum stimulation experiment was performed and the genome-wide gene expression in the resulting samples was profiled using the Affymetrix Human Genome U133 Plus 2.0 chip. Array data were analyzed using pcaGoPromoter package tools, resulting in a clear separation of the experiments into three groups: controls, serum only and serum with inhibitor. Functional annotation of the axes in the PCA score plot showed the expected serum-promoted biological processes, e.g., cell cycle progression and the predicted involvement of expected transcription factors, including E2F. In addition, unexpected results, e.g., cholesterol synthesis in serum-depleted cells and NF-κB activation in inhibitor treated cells, were noted. In summary, the pcaGoPromoter R package provides a collection of tools for analyzing gene expression data. These tools give an overview of the input data via PCA, functional interpretation by gene ontology terms (biological processes), and an indication of the involvement of possible transcription factors.
Highlights
Working with genome-wide gene expression data is challenging for the typical molecular biologist with training mainly focusing on laboratory techniques and only to lesser extend in the fields of mathematics or biostatistics
We have previously demonstrated that principal component analysis (PCA) can provide an experimentoriented view in combination with a functional interpretation of the PCA axes with respect to transcription factor involvement and biological function [12,13,14]
We describe a serum stimulation experiment using human monocytes that was designed to illustrate the use of the pcaGoPromoter package algorithms and tools
Summary
Working with genome-wide gene expression data is challenging for the typical molecular biologist with training mainly focusing on laboratory techniques and only to lesser extend in the fields of mathematics or biostatistics. An example of an experiment requiring genome-wide gene expression analysis is the extraction of RNA from a tissue sample taken in situ or from an ex vivo cultured cell line. The differences in mRNA levels between the different samples can be ascribed to three different effects: consequences of cellular signal transduction, cellular differentiation or the migration of cells into or out of the tissue. Under these circumstances, key transcription factors are responsible for establishing differences in the mRNA levels
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.