Automatic peak selection by a Benjamini-Hochberg-based algorithm.

Ahmed Abbas,Bing-Yi Jing,Zhi Liu,Xin-Bing Kong,Xin Gao,Anna Tramontano

doi:10.1371/journal.pone.0053112

Abstract

A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into -values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx.

Highlights

Many computational bioinformatics methods generate a large number of predictions for the correct solution to a problem among which are both true and false predictions
In nuclear magnetic resonance (NMR)-based protein structure determination, thousands of peaks are routinely predicted from the input spectra in which there are usually tens to hundreds of true signals
We demonstrate that the proposed method significantly outperforms the fixed number-based method on selecting the true peaks from the predictions by the state-of-the-art peak picking methods, including WaVPeak and PICKY

Summary

Introduction

Many computational bioinformatics methods generate a large number of predictions for the correct solution to a problem among which are both true and false predictions. Such predictions are usually sorted according to certain confidence scores. The energy values are calculated for each model based on a given energy function, where lower values likely indicate better models Another example is the protein function annotation problem in which the amino acid sequence or the domain architecture of a protein is given and the Gene Ontology (GO) terms selected from among some 30,000 are used to annotate the function. It is crucial to know how many predictions should be selected in such scenarios

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Jan 7, 2013
Citations: 55	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automatic peak selection by a Benjamini-Hochberg-based algorithm.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Trainable segmentation for transmission electron microscope images of inorganic nanoparticles.
Cameron G Bell ... Angus I Kirkland
Journal of Microscopy | VOL. 288
Cameron G Bell, et. al.Cameron G Bell ... Angus I Kirkland
11 May 2022
Journal of Microscopy | VOL. 288

Optimization of the Use of Consensus Methods for the Detection and Putative Identification of Peptides via Mass-spectrometry Using Protein Standard Mixtures
Tamanna Sultana ... James Lyons-Weiler
Journal of Proteomics & Bioinformatics | VOL. 02
Tamanna Sultana, et. al.Tamanna Sultana ... James Lyons-Weiler
01 Jun 2009
Journal of Proteomics & Bioinformatics | VOL. 02

An Algorithm for Early Outbreak Detection in Multiple Data Streams
Sesha K Dassanayake ... Joshua French
Online Journal of Public Health Informatics | VOL. 11
Sesha K Dassanayake, et. al.Sesha K Dassanayake ... Joshua French
30 May 2019
Online Journal of Public Health Informatics | VOL. 11

Testing Jumps via False Discovery Rate Control
Yu-Min Yen
SSRN Electronic Journal | VOL. -
Yu-Min YenYu-Min Yen
21 Mar 2013
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic peak selection by a Benjamini-Hochberg-based algorithm.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE