Abstract

The process of assigning a finite set of tags or labels to a collection of observations, subject to side conditions, is notable for its computational complexity. This labeling paradigm is of theoretical and practical relevance to a wide range of biological applications, including the analysis of data from DNA microarrays, metabolomics experiments, and biomolecular nuclear magnetic resonance (NMR) spectroscopy. We present a novel algorithm, called Probabilistic Interaction Network of Evidence (PINE), that achieves robust, unsupervised probabilistic labeling of data. The computational core of PINE uses estimates of evidence derived from empirical distributions of previously observed data, along with consistency measures, to drive a fictitious system M with Hamiltonian H to a quasi-stationary state that produces probabilistic label assignments for relevant subsets of the data. We demonstrate the successful application of PINE to a key task in protein NMR spectroscopy: that of converting peak lists extracted from various NMR experiments into assignments associated with probabilities for their correctness. This application, called PINE-NMR, is available from a freely accessible computer server (http://pine.nmrfam.wisc.edu). The PINE-NMR server accepts as input the sequence of the protein plus user-specified combinations of data corresponding to an extensive list of NMR experiments; it provides as output a probabilistic assignment of NMR signals (chemical shifts) to sequence-specific backbone and aliphatic side chain atoms plus a probabilistic determination of the protein secondary structure. PINE-NMR can accommodate prior information about assignments or stable isotope labeling schemes. As part of the analysis, PINE-NMR identifies, verifies, and rectifies problems related to chemical shift referencing or erroneous input data. PINE-NMR achieves robust and consistent results that have been shown to be effective in subsequent steps of NMR structure determination.

Highlights

  • Labeling a set of fixed data with another representative set is the generic description for a large family of problems

  • The Probabilistic Interaction Network of Evidence (PINE) algorithm that we present here offers a general solution to this problem

  • We have demonstrated the usefulness of the PINE approach by applying it to one of the major bottlenecks in nuclear magnetic resonance (NMR) spectroscopy

Read more

Summary

Introduction

Labeling a set of fixed data with another representative set is the generic description for a large family of problems. This family includes clustering and dimensionality reduction, an approach in which the original dataset is represented by a set of typically far lower dimension (the representative set). The labeling problem is important, because it is encountered in many applications involving data analysis, where prior knowledge of the probability distributions is incomplete or lacking. A challenging instance of the labeling problem arises naturally in nuclear magnetic resonance (NMR) spectroscopy, which along with X-ray crystallography is one of the two major methods for determining protein structures. The labeling problem known as the ‘‘assignment problem’’, has been one of the major bottlenecks in protein NMR spectroscopy

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.