Abstract
BackgroundLarge-scale sequencing experiments are complex and require a wide spectrum of computational tools to extract and interpret relevant biological information. This is especially true in projects where individual processing and integrated analysis of both small RNA and complementary RNA data is needed. Such studies would benefit from a computational workflow that is easy to implement and standardizes the processing and analysis of both sequenced data types.ResultsWe developed SePIA (Sequence Processing, Integration, and Analysis), a comprehensive small RNA and RNA workflow. It provides ready execution for over 20 commonly known RNA-seq tools on top of an established workflow engine and provides dynamic pipeline architecture to manage, individually analyze, and integrate both small RNA and RNA data. Implementation with Docker makes SePIA portable and easy to run. We demonstrate the workflow’s extensive utility with two case studies involving three breast cancer datasets. SePIA is straightforward to configure and organizes results into a perusable HTML report. Furthermore, the underlying pipeline engine supports computational resource management for optimal performance.ConclusionSePIA is an open-source workflow introducing standardized processing and analysis of RNA and small RNA data. SePIA’s modular design enables robust customization to a given experiment while maintaining overall workflow structure. It is available at http://anduril.org/sepia.Electronic supplementary materialThe online version of this article (doi:10.1186/s13040-016-0099-z) contains supplementary material, which is available to authorized users.
Highlights
Large-scale sequencing experiments are complex and require a wide spectrum of computational tools to extract and interpret relevant biological information
The second and third datasets comprised of Level 1 data of 144 poly(A)-extracted mRNA samples (129 tumor, 15 normal breast tissue) and 149 miRNA samples (133 tumor, 16 normal breast tissue) downloaded from The Cancer Genome Atlas consortium [32]
We organize the datasets into two case studies: the first to showcase SePIA’s utility for transcript-level sequence analysis; the second to demonstrate integration of mRNA and small RNA data
Summary
Large-scale sequencing experiments are complex and require a wide spectrum of computational tools to extract and interpret relevant biological information This is especially true in projects where individual processing and integrated analysis of both small RNA and complementary RNA data is needed. Such studies would benefit from a computational workflow that is easy to implement and standardizes the processing and analysis of both sequenced data types. Strategies have been developed to computationally identify and interpret biological information from different RNA-seq data types [2,3,4] These strategies are generally limited to a single data type or integration alone, with a set number of tools and little to no support for extensibility. A solution to this issue is a modular computational platform that allows testing, development, and easy replacement of methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.