Abstract

BackgroundWe present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools.ResultsThe Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries.ConclusionsThe Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at .

Highlights

  • We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools

  • To demonstrate the utility of Pegasys in widely different bioinformatics tasks, we present three use cases of the system: a single application workflow, a workflow designed for formatting a database for BLAST [12,13] and searching the newly formatted database, and a workflow designed for genome annotation of eukaryotic genomic sequence

  • The first is a FASTA-formatted nucleotide sequence file. This file is used as input to 'formatdb' – an application that transforms FASTA-formatted databases into a format that can be used by BLAST

Read more

Summary

Results

The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries

Conclusions
Background
Schedule DAG for execution
Results and discussion
15. R Development Core Team
17. Gish W
21. Krogh A
28. Booch G
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.