AlignMR

Urmi Bhayani,John Springer

doi:10.1145/2649387.2660823

Abstract

Proteomics is the study of the structure and behavior of proteins, and one of the primary approaches to protein identification and quantification is through the analysis of Mass Spectrometry (MS) data. This analysis typically involves a series of different computational steps, and the Purdue University Bindley Bioscience Center employs a computational workflow system, the Omics Discovery Pipeline (ODP), to assist in its analysis of MS data. One of the ODP's stages entails aligning the peaks in the MS data across multiple subjects, and due to the large number of subjects that may be used in a study and the large number of peaks found in each subject's corresponding MS data, the alignment step qualifies as a data intensive computation. This research focuses on using Apache Hadoop MapReduce to align the processed MS data in a computationally faster manner than the serial approach currently used in the ODP.

Full Text