Abstract

The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed high dimensional transcriptomics data from two different platforms into a lower dimensional, and biologically relevant dataset by calculating enrichment scores based on gene set collections for all samples. We compared the similarity between data from both platforms based on the raw data and on the enrichment scores. We show that the performed data transforms the data in a biologically relevant way and filters out noise which leads to increased platform concordance. We validate the procedure using predictive models built with microarray based enrichment scores to predict subtypes of breast cancer using enrichment scores based on sequenced data. Although microarray and RNA-Seq expression levels might appear different, transforming them into biologically relevant gene set enrichment scores significantly increases their correlation, which is a step forward in data integration of the two platforms. The gene set collections were shown to contain biologically relevant gene sets. More in-depth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research.

Highlights

  • To determine cellular activity of a culture or tissue, the field of transcriptomics currently has two major platforms at its disposal, namely microarrays and RNA-Seq

  • The field of transcriptomics uses and measures mRNA as a proxy of gene expression

  • In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms

Read more

Summary

Introduction

To determine cellular activity of a culture or tissue, the field of transcriptomics currently has two major platforms at its disposal, namely microarrays and RNA-Seq. As a proxy of gene expression both platforms can be used to quantify the constituent of all protein encoding transcripts, or mRNA, present in a sample. The type of nucleotide (A, C, T, G) that was incorporated in the strand is determined by a fluorescent label which is cleaved off, this in turn allows extension by a subsequent nucleotide. This cycle is repeated in a massively parallel fashion. If such a reference genome is not available, de novo transcriptome assembly is possible depending on adequate coverage and sequence depth

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.