Abstract

BackgroundDynamic expression data, nowadays obtained using high-throughput RNA sequencing, are essential to monitor transient gene expression changes and to study the dynamics of their transcriptional activity in the cell or response to stimuli. Several methods for data selection, clustering and functional analysis are available; however, these steps are usually performed independently, without exploiting and integrating the information derived from each step of the analysis.MethodsHere we present FunPat, an R package for time series RNA sequencing data that integrates gene selection, clustering and functional annotation into a single framework. FunPat exploits functional annotations by performing for each functional term, e.g. a Gene Ontology term, an integrated selection-clustering analysis to select differentially expressed genes that share, besides annotation, a common dynamic expression profile.ResultsFunPat performance was assessed on both simulated and real data. With respect to a stand-alone selection step, the integration of the clustering step is able to improve the recall without altering the false discovery rate. FunPat also shows high precision and recall in detecting the correct temporal expression patterns; in particular, the recall is significantly higher than hierarchical, k-means and a model-based clustering approach specifically designed for RNA sequencing data. Moreover, when biological replicates are missing, FunPat is able to provide reproducible lists of significant genes. The application to real time series expression data shows the ability of FunPat to select differentially expressed genes with high reproducibility, indirectly confirming high precision and recall in gene selection. Moreover, the expression patterns obtained as output allow an easy interpretation of the results.ConclusionsA novel analysis pipeline was developed to search the main temporal patterns in classes of genes similarly annotated, improving the sensitivity of gene selection by integrating the statistical evidence of differential expression with the information on temporal profiles and the functional annotations. Significant genes are associated to both the most informative functional terms, avoiding redundancy of information, and the most representative temporal patterns, thus improving the readability of the results. FunPat package is provided in R/Bioconductor at link: http://sysbiobig.dei.unipd.it/?q=node/79.

Highlights

  • Dynamic expression data, nowadays obtained using high-throughput RNA sequencing, are essential to monitor transient gene expression changes and to study the dynamics of their transcriptional activity in the cell or response to stimuli

  • FunPat pipeline FunPat takes as input the expression data and the functional annotations organized according to Gene Sets, which can be defined as Gene Ontology (GO) terms, pathways or other sets depending on the annotation database used as input

  • We considered the stand-alone application of the BoundedArea method in order to evaluate if the integration of gene selection with the clustering step and the functional annotation is able to improve the recall without loss in precision

Read more

Summary

Introduction

Nowadays obtained using high-throughput RNA sequencing, are essential to monitor transient gene expression changes and to study the dynamics of their transcriptional activity in the cell or response to stimuli. Gene expression regulation is an intrinsically dynamic phenomenon, whose characteristics can be investigated using dynamic expression data In this context, microarray technique, RNA-seq avoids the design of specific probes, enabling a higher number of transcripts to be measured on a wider dynamic range. Among others, Trimmed Mean of M-values (TMM) [6] provides scaling factors to correct the library sizes calculated as a weighted mean of log ratios after filtering out the most expressed genes and the genes with the largest log ratios This approach has been recently shown to prevent loss of statistical power in the analysis of RNA-seq data when high-count genes are present [7]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.