Abstract

BackgroundTechnological improvements have shifted the focus from data generation to data analysis. The availability of large amounts of data from transcriptomics, protemics and metabolomics experiments raise new questions concerning suitable integrative analysis methods. We compare three integrative analysis techniques (co-inertia analysis, generalized singular value decomposition and integrative biclustering) by applying them to gene and protein abundance data from the six life cycle stages of Plasmodium falciparum. Co-inertia analysis is an analysis method used to visualize and explore gene and protein data. The generalized singular value decomposition has shown its potential in the analysis of two transcriptome data sets. Integrative Biclustering applies biclustering to gene and protein data.ResultsUsing CIA, we visualize the six life cycle stages of Plasmodium falciparum, as well as GO terms in a 2D plane and interpret the spatial configuration. With GSVD, we decompose the transcriptomic and proteomic data sets into matrices with biologically meaningful interpretations and explore the processes captured by the data sets. IBC identifies groups of genes, proteins, GO Terms and life cycle stages of Plasmodium falciparum. We show method-specific results as well as a network view of the life cycle stages based on the results common to all three methods. Additionally, by combining the results of the three methods, we create a three-fold validated network of life cycle stage specific GO terms: Sporozoites are associated with transcription and transport; merozoites with entry into host cell as well as biosynthetic and metabolic processes; rings with oxidation-reduction processes; trophozoites with glycolysis and energy production; schizonts with antigenic variation and immune response; gametocyctes with DNA packaging and mitochondrial transport. Furthermore, the network connectivity underlines the separation of the intraerythrocytic cycle from the gametocyte and sporozoite stages.ConclusionUsing integrative analysis techniques, we can integrate knowledge from different levels and obtain a wider view of the system under study. The overlap between method-specific and common results is considerable, even if the basic mathematical assumptions are very different. The three-fold validated network of life cycle stage characteristics of Plasmodium falciparum could identify a large amount of the known associations from literature in only one study.

Highlights

  • Technological improvements have shifted the focus from data generation to data analysis

  • With Generalized singular value decomposition (GSVD), we decompose the data sets into matrices with biologically meaningful interpretations and explore the processes captured by the data sets

  • Proteins were detected by multidimensional protein identification technology (MudPIT), and protein abundance was estimated by the number of MS/MS spectra identified per protein

Read more

Summary

Introduction

Technological improvements have shifted the focus from data generation to data analysis. Other approaches are based on network analysis [20,21] and statistical methods such as analysis of variation, clustering and gene set enrichment [22,23,24]. In a study by Hahne and colleagues [22] analysis of variation, k-means clustering and functional annotation were applied to transcriptome and proteome data from salt-stressed B. subtilis cells. They showed a well-coordinated induction of gene expression and changes of the protein levels as the result of a severe salt shock. Other integrative approaches can be found in [27,28,29] for omics data in general and in [19] for transcriptome and proteome data in particular

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call