CoreFlow: A computational platform for integration, analysis and modeling of complex biological data

Adrian Pasculescu,Jonathan So,Marina Olhovsky,Tony Pawson,Rachel D Vanderlaan,Pau Creixell,Yong Zheng,Karen Colwill,Erwin M Schoof,Rune Linding,Ruijun Tian

doi:10.1016/j.jprot.2014.01.023

Abstract

A major challenge in mass spectrometry and other large-scale applications is how to handle, integrate, and model the data that is produced. Given the speed at which technology advances and the need to keep pace with biological experiments, we designed a computational platform, CoreFlow, which provides programmers with a framework to manage data in real-time. It allows users to upload data into a relational database (MySQL), and to create custom scripts in high-level languages such as R, Python, or Perl for processing, correcting and modeling this data. CoreFlow organizes these scripts into project-specific pipelines, tracks interdependencies between related tasks, and enables the generation of summary reports as well as publication-quality images. As a result, the gap between experimental and computational components of a typical large-scale biology project is reduced, decreasing the time between data generation, analysis and manuscript writing. CoreFlow is being released to the scientific community as an open-sourced software package complete with proteomics-specific examples, which include corrections for incomplete isotopic labeling of peptides (SILAC) or arginine-to-proline conversion, and modeling of multiple/selected reaction monitoring (MRM/SRM) results. CoreFlow was purposely designed as an environment for programmers to rapidly perform data analysis. These analyses are assembled into project-specific workflows that are readily shared with biologists to guide the next stages of experimentation. Its simple yet powerful interface provides a structure where scripts can be written and tested virtually simultaneously to shorten the life cycle of code development for a particular task. The scripts are exposed at every step so that a user can quickly see the relationships between the data, the assumptions that have been made, and the manipulations that have been performed. Since the scripts use commonly available programming languages, they can easily be transferred to and from other computational environments for debugging or faster processing. This focus on 'on the fly' analysis sets CoreFlow apart from other workflow applications that require wrapping of scripts into particular formats and development of specific user interfaces. Importantly, current and future releases of data analysis scripts in CoreFlow format will be of widespread benefit to the proteomics community, not only for uptake and use in individual labs, but to enable full scrutiny of all analysis steps, thus increasing experimental reproducibility and decreasing errors. This article is part of a Special Issue entitled: Can Proteomics Fill the Gap Between Genomics and Phenotypes?

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Proteomics	Publication Date: Feb 3, 2014
Citations: 23	License type: other-oa

R Discovery Prime

R Discovery Prime

CoreFlow: A computational platform for integration, analysis and modeling of complex biological data

Abstract

Talk to us

Similar Papers

More From: Journal of Proteomics

Lead the way for us

Similar Papers

Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges
Zekun Yin ... Weiguo Liu
Computational and Structural Biotechnology Journal | VOL. 15
Zekun Yin, et. al.Zekun Yin ... Weiguo Liu
01 Jan 2017
Computational and Structural Biotechnology Journal | VOL. 15

A Digital Platform for Integration and Analysis of Geophysical Monitoring Data from the Baikal Natural Zone
Andrey Pavlovich Grigoryuk ... Valeriy Viktorovich Kovalevskiy
Russian Digital Libraries Journal | VOL. 25
Andrey Pavlovich Grigoryuk, et. al.Andrey Pavlovich Grigoryuk ... Valeriy Viktorovich Kovalevskiy
14 Nov 2022
Russian Digital Libraries Journal | VOL. 25

Design and Research of Big Data Collection and Analysis Platform Based on Cloud Computing
Xuan Pei ... Xiaoying Ren
IOP Conference Series: Materials Science and Engineering | VOL. 677
Xuan Pei, et. al.Xuan Pei ... Xiaoying Ren
01 Dec 2019
IOP Conference Series: Materials Science and Engineering | VOL. 677

Knowledge and intelligent computing techniques in bioinformatics
Divya Anand ... Babita Pandey
International Journal of Computational Biology and Drug Design | VOL. 9
Divya Anand, et. al.Divya Anand ... Babita Pandey
01 Jan 2015
International Journal of Computational Biology and Drug Design | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CoreFlow: A computational platform for integration, analysis and modeling of complex biological data

Abstract

Talk to us

Similar Papers

More From: Journal of Proteomics