Frequently, at scientific meetings proteome researchers meet after the formal program has ended to discuss the state of the field and their own situation over some drinks. Typically these discussions center around instrumentation, primarily mass spectrometers, with a faster scan rate, higher mass accuracy or sensitivity, and the results generated by these powerful instruments. Statements like, “this is the future,” “this will revolutionize proteomics,” or “this has changed what we do in the lab” can be heard around the table. These sentiments are certainly justified. The progress in mass spectrometric instrumentation in the short history of proteomics has been nothing but astounding. Just a decade ago virtually all proteomic data were generated on ion trap instruments with low resolution, low mass accuracy, and limited scan speed and sensitivity, compared with the instruments presently in common use. Today, a significant fraction of the proteomic data is collected on high performance hybrid mass spectrometers with high scan speed, high mass accuracy and resolution, and high sensitivity of detection. The effects of these dramatic improvements are apparent in the manuscripts being published. However, its importance notwithstanding, data collection is only half of the whole proteomic story. The capacity to annotate, analyze, store, manage, and distribute the many gigabytes of data collected is the other half of the proteomic technology tandem. Long gone are the days when the expert analyst pretended to have manually verified all the data supporting the conclusions in a proteomic paper. Therefore, proteomics today, particularly mass spectrometry-based proteomics, depends as much on computers and the software tools they use as it does on high tech instruments to generate the data. Surprisingly, at the sessions in the bar, no one ever seems to boast about the data analysis part of the proteomic enterprise. Statements such as “ this program just recovered 20% more true positive peptide identification from my data set,” “ this allowed me to analyze my data ten times faster and at a credible FDR,” or “this statistical model allowed me to identify statistically significant quantitative changes in my dataset” are seldom if ever uttered. It is unlikely that things in the realm of electronics and software are inherently less appealing or attractive than high performance devices because the same individuals boasting about their mass spectrometers are likely to also share interesting new iPhone apps. It is certainly also not the case that advances on the software side of proteomics have had a lower impact than publications describing new hardware. In fact some of the most highly cited proteomics papers are about software. I therefore propose the likely controversial thesis that the software world of proteomics is less generally discussed because few can keep up and know about the field in some depth. This view is supported by several observations. First, in a recent study (1), in which a standardized protein sample was analyzed by a number of laboratories, it was clearly apparent that the analysis of the collected data was a major weakness. Second, the adoption of new software tools for proteomic data analysis takes much longer than the adoption of new instruments or wet-lab protocols. To raise the level of awareness in the crucial area of proteomic software tools, Molecular and Cellular Proteomics will publish a series of mini reviews that summarize the current state of specific aspects of this often bewildering field. These are not tutorials, i.e. the reader still will have to figure out how to use the tools and which tools to use. The intent of this series is to discuss the significance of a particular type of analysis, the question(s) addressed, the currently available solutions and their strengths and weaknesses, and the availability and utility of germane software tools or computational resources. We expect that these mini reviews will help proteome researchers to extract more and better information from the data generated by the wonderful instruments available today and to increase the excitement about the topic so that statistical models, search engines, and XML schema can also claim the stature in bar discussions that they deserve.
Read full abstract