Abstract

We consider the identification of interacting protein-nucleic acid partners using the rigid body docking method FTdock, which is systematic and exhaustive in the exploration of docking conformations. The accuracy of rigid body docking methods is tested using known protein-DNA complexes for which the docked and undocked structures are both available. Additional tests with large decoy sets probe the efficacy of two published statistically derived scoring functions that contain a huge number of parameters. In contrast, we demonstrate that state-of-the-art machine learning techniques can enormously reduce the number of parameters required, thereby identifying the relevant docking features using a miniscule fraction of the number of parameters in the prior works. The present machine learning study considers a 300 dimensional vector (dependent on only 15 parameters), termed the Chemical Context Profile (CCP), where each dimension reflects a specific type of protein amino acid-nucleic acid base interaction. The CCP is designed to capture the chemical complementarities of the interface and is well suited for machine learning techniques. Our objective function is the Chemical Context Discrepancy (CCD), which is defined as the angle between the native system's CCP vector and the decoy's vector and which serves as a substitute for the more commonly used root mean squared deviation (RMSD). We demonstrate that the CCP provides a useful scoring function when certain dimensions are properly weighted. Finally, we explore how the amino acids on a protein's surface can help guide DNA binding, first through long-range interactions, followed by direct contacts, according to specific preferences for either the major or minor grooves of the DNA.

Highlights

  • Interacting molecules convey information via their association that is driven by surface complementarity and chemical compatibility

  • The magnitude of the Chemical Context Profile (CCP) correlates with the loss of surface area upon binding

  • Profiles from different poses can be compared with one another by their similarity to the native CCP, where the similarity is defined using the Chemical Context Discrepancy (CCD)

Read more

Summary

Introduction

Interacting molecules convey information via their association that is driven by surface complementarity and chemical compatibility. Given a number of recent and promising structure prediction algorithms for DNA [2], RNA [3,4,5], and proteins [6,7], along with the wealth of data generated by various structural genomics initiatives [8,9], a major goal is to devise methods for the automatic determination of gene and protein networks at a molecular level and on a genomic scale This daunting task requires docking algorithms that can handle the three major classes of molecules, as well as large scale computing resources to perform the computations on a genomic scale. Other treatments use fuzzy restraints, for example, from experimental studies [18] but are restricted to treat either protein-DNA or protein-RNA interactions

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call