Abstract
With the fast development of high-throughput sequencing technologies, a new generation of genome-wide gene expression measurements is under way. This is based on mRNA sequencing (RNA-seq), which complements the already mature technology of microarrays, and is expected to overcome some of the latter’s disadvantages. These RNA-seq data pose new challenges, however, as strengths and weaknesses have yet to be fully identified. Ideally, Next (or Second) Generation Sequencing measures can be integrated for more comprehensive gene expression investigation to facilitate analysis of whole regulatory networks. At present, however, the nature of these data is not very well understood. In this paper we study three alternative gene expression time series datasets for the Drosophila melanogaster embryo development, in order to compare three measurement techniques: RNA-seq, single-channel and dual-channel microarrays. The aim is to study the state of the art for the three technologies, with a view of assessing overlapping features, data compatibility and integration potential, in the context of time series measurements. This involves using established tools for each of the three different technologies, and technical and biological replicates (for RNA-seq and microarrays, respectively), due to the limited availability of biological RNA-seq replicates for time series data. The approach consists of a sensitivity analysis for differential expression and clustering. In general, the RNA-seq dataset displayed highest sensitivity to differential expression. The single-channel data performed similarly for the differentially expressed genes common to gene sets considered. Cluster analysis was used to identify different features of the gene space for the three datasets, with higher similarities found for the RNA-seq and single-channel microarray dataset.
Highlights
Analysis of the gene expression process has been an important topic for many years [1], as it can have outcomes important for understanding the way in which genetic information is processed, as well as the mechanisms involved in both natural and abnormal processes
Differential Expression In the first analysis performed, we studied the differentially expressed (DE) sets of genes obtained from different datasets with q-value under 0:01
The gene sets should show significant overlap, and should be similar in size; in reality, this depends both on the biological variability, measurement parameters and on the DE test, so gene sets vary from one dataset to another
Summary
Analysis of the gene expression process has been an important topic for many years [1], as it can have outcomes important for understanding the way in which genetic information is processed, as well as the mechanisms involved in both natural and abnormal processes. Recent advances in high throughput sequencing technologies ( or Second Generation Sequencing) have introduced a new alternative to microarrays, namely RNA-seq [4] This quantifies gene expression by sequencing short strands of cDNA, aligning sequences obtained back to the genome or transcriptome, and counting the aligned reads for each gene. This technology is expected to overcome some of the disadvantages of microarrays. Significant efforts have been made to modify algorithms and technologies, problems still exist with obtaining quantified transcription data Some of these relate to read errors, short read mapping, SNPs, RNA splicing and sequencing depth, which affect analysis of more complex transcriptomes [4]. Improvements are expected as the length of reads is increased [5] and new algorithms and methods are developed, so that RNA-seq will eventually become a more accessible tool for gene expression analysis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.