Abstract

BackgroundRNA-Seq technology is routinely used to characterize the transcriptome, and to detect gene expression differences among cell types, genotypes and conditions. Advances in short-read sequencing instruments such as Illumina Next-Seq have yielded easy-to-operate machines, with high throughput, at a lower price per base. However, processing this data requires bioinformatics expertise to tailor and execute specific solutions for each type of library preparation.ResultsIn order to enable fast and user-friendly data analysis, we developed an intuitive and scalable transcriptome pipeline that executes the full process, starting from cDNA sequences derived by RNA-Seq [Nat Rev Genet 10:57-63, 2009] and bulk MARS-Seq [Science 343:776-779, 2014] and ending with sets of differentially expressed genes. Output files are placed in structured folders, and results summaries are provided in rich and comprehensive reports, containing dozens of plots, tables and links.ConclusionOur User-friendly Transcriptome Analysis Pipeline (UTAP) is an open source, web-based intuitive platform available to the biomedical research community, enabling researchers to efficiently and accurately analyse transcriptome sequence data.

Highlights

  • RNA-Seq technology is routinely used to characterize the transcriptome, and to detect gene expression differences among cell types, genotypes and conditions

  • The massive amount of data created by Next-generation sequencing (NGS) requires bioinformatics expertise to tailor specific solutions for each type of library preparation

  • Implementing the solutions typically requires scripting and running commands in the Linux environment. An example of such protocols can be seen at [8]. To address this challenge and simplify the analysis, we developed a transcriptome pipeline, with an intuitive user interface (Fig. 1; results in supplementary materials; demonstration)

Read more

Summary

Results

Our User-friendly Transcriptome Analysis Pipeline (UTAP) requires minimal user intervention. The pipeline runs the following steps (see Fig. 2 and examples in supplementary materials): demultiplexing, adapter and low-quality trimming, quality checks, mapping to a genome, gene quantification, UMI counting (if required), normalization, and detection of statistically significant differentially expressed genes (DEG) for pairwise. The report closes with a description of the databases, tools and parameters used, and links to additional results All pipeline outputs, such as trimmed fastq files, mapped and indexed bam files, matrices of raw, normalized counts and statistical DEG values, are available in structured folders. The other platforms either lack a friendly graphical user interface, and/or are not scalable, and/or have complex installations, and/or do not provide predefined pipelines, and/or do not provide meticulous ways to detect differentially expressed genes, and/or do not have structured outputs. Our future plans include improving customization by providing options to modify parameters via the web interface, adding NGS pipelines such as small RNAs, ChIP-Seq, ATAC-Seq, Ribo-Seq, SNP detection in RNA-Seq and single-cell RNA-Seq, and adapting the pipeline to run on other types of computing clusters and in the cloud

Conclusion
Background
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call