De novo protein structure prediction can be formulated as search in a high-dimensional space. One of the most frequently used computational tools to solve such search problems is the Monte Carlo method. We present a novel search technique, called model-based search. This method samples the high-dimensional search space to build an approximate model of the underlying function. This model is incrementally refined in areas of interest, whereas areas that are not of interest are excluded from further exploration. Model-based search derives its efficiency from the fact that the information obtained during the exploration of the search space is used to guide further exploration. In contrast, Monte Carlo-based techniques lack memory and exploration is performed based on random walks, ignoring the information obtained in previous steps. Model-based search is applied to protein structure prediction, where search is employed to find the global minimum of the protein's energy landscape. We show that model-based search uses computational resources more efficiently to find lower-energy conformations of proteins than one of the leading protein structure prediction methods, which relies on a tailored Monte Carlo method to perform a search. The performance improvements become more pronounced as the dimensionality of the search problem increases. We argue that model-based search will enable more accurate protein structure prediction than was previously possible. Furthermore, we believe that similar performance improvements can be expected in other problems that are currently solved using Monte Carlo-based search methods. An implementation of model-based search can be obtained by contacting the authors.
Read full abstract