Abstract

A new method termed “Relative Principal Components Analysis” (RPCA) is introduced that extracts optimal relevant principal components to describe the change between two data samples representing two macroscopic states. The method is widely applicable in data-driven science. Calculating the components is based on a physical framework that introduces the objective function (the Kullback–Leibler divergence) appropriate for quantifying the change of the macroscopic state affected by the changes in the microscopic features. To demonstrate the applicability of RPCA, we analyze the thermodynamically relevant conformational changes of the protein HIV-1 protease upon binding to different drug molecules. In this case, the RPCA method provides a sound thermodynamic foundation for analyzing the binding process and thus characterizing both the collective and the locally relevant conformational changes. Moreover, the relevant collective conformational changes can be reconstructed from the informative latent variables to exhibit both the enhanced and the restricted conformational fluctuations upon ligand association.

Highlights

  • Studying the transitions and differences between multiple states populated by a dynamic system is a central topic in different fields including chemistry, physics, biology, machine learning, and all of data-driven science

  • Before going into the technical details of finding the directions in feature space that are informative of the change between two states, we first introduce a physical framework for defining and quantifying the change of dynamic systems in all areas of datadriven sciences and justify the objective function used for quantifying the macroscopic change

  • We introduced the relative principal components (RPCAs) method, which extracts the relevant principal components describing the change between two macroscopic states of a dynamic system represented by mutant, respectively)

Read more

Summary

INTRODUCTION

Studying the transitions and differences between multiple states populated by a dynamic system is a central topic in different fields including chemistry, physics, biology, machine learning, and all of data-driven science. Before going into the technical details of finding the directions in feature space that are informative of the change between two states, we first introduce a physical framework for defining and quantifying the change of dynamic systems in all areas of datadriven sciences and justify the objective function used for quantifying the macroscopic change. Besides being used to formulate a theoretical framework for studying the error bound of parameter estimation,[25] the formalism of exponential families plays a central role in different fields of machine learning such as generalized linear models and variational inference.[3] In statistical thermodynamics, Kirkwood introduced his thermodynamic integration (TI) equation[20] using the exponential family to “alchemically” interpolate two macroscopic states.[20] the one-dimensional sufficient statistic, “the perturbation”, is the appropriate tool for interpolating between two macroscopic states in free energy calculations. Significant perturbations are reflected by significant changes of KL divergence

RELATIVE PRINCIPAL COMPONENTS ANALYSIS
CONCLUSIONS
■ REFERENCES
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call