Open Problems In Computational Biology Research Articles

Mechanistic dynamical models allow us to study the behavior of complex biological systems. They can provide an objective and quantitative understanding that would be difficult to achieve through other means. However, the systematic development of these models is a non-trivial exercise and an open problem in computational biology. Currently, many research efforts are focused on model discovery, i.e. automating the development of interpretable models from data. One of the main frameworks is sparse regression, where the sparse identification of nonlinear dynamics (SINDy) algorithm and its variants have enjoyed great success. SINDy-PI is an extension which allows the discovery of rational nonlinear terms, thus enabling the identification of kinetic functions common in biochemical networks, such as Michaelis-Menten. SINDy-PI also pays special attention to the recovery of parsimonious models (Occam's razor). Here we focus on biological models composed of sets of deterministic nonlinear ordinary differential equations. We present a methodology that, combined with SINDy-PI, allows the automatic discovery of structurally identifiable and observable models which are also mechanistically interpretable. The lack of structural identifiability and observability makes it impossible to uniquely infer parameter and state variables, which can compromise the usefulness of a model by distorting its mechanistic significance and hampering its ability to produce biological insights. We illustrate the performance of our method with six case studies. We find that, despite enforcing sparsity, SINDy-PI sometimes yields models that are unidentifiable. In these cases we show how our method transforms their equations in order to obtain a structurally identifiable and observable model which is also interpretable.

Read full abstract

BackgroundThe protein folding problem remains one of the most challenging open problems in computational biology. Simplified models in terms of lattice structure and energy function have been proposed to ease the computational hardness of this optimization problem. Heuristic search algorithms and constraint programming are two common techniques to approach this problem. The present study introduces a novel hybrid approach to simulate the protein folding problem using constraint programming technique integrated within local search.ResultsUsing the face-centered-cubic lattice model and 20 amino acid pairwise interactions energy function for the protein folding problem, a constraint programming technique has been applied to generate the neighbourhood conformations that are to be used in generic local search procedure. Experiments have been conducted for a few small and medium sized proteins. Results have been compared with both pure constraint programming approach and local search using well-established local move set. Substantial improvements have been observed in terms of final energy values within acceptable runtime using the hybrid approach.ConclusionConstraint programming approaches usually provide optimal results but become slow as the problem size grows. Local search approaches are usually faster but do not guarantee optimal solutions and tend to stuck in local minima. The encouraging results obtained on the small proteins show that these two approaches can be combined efficiently to obtain better quality solutions within acceptable time. It also encourages future researchers on adopting hybrid techniques to solve other hard optimization problems.

Read full abstract

Open Problems In Computational Biology Research Articles

Articles published on Open Problems In Computational Biology

Distilling identifiable and interpretable dynamic models from biological data.

Characterizing Star-PCGs

Protein Sequence Alignment Analysis by Local Covariation: Coevolution Statistics Detect Benchmark Alignment Errors

Modeling Structures and Motions of Loops in Protein Molecules

A hybrid approach to protein folding problem integrating constraint programming with local search

Classification of co-expressed genes from DNA regulatory regions

Approximate protein folding in the HP side chain model on extended cubic lattices

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Open Problems In Computational Biology Research Articles

Articles published on Open Problems In Computational Biology

Distilling identifiable and interpretable dynamic models from biological data.

Characterizing Star-PCGs

Protein Sequence Alignment Analysis by Local Covariation: Coevolution Statistics Detect Benchmark Alignment Errors

Modeling Structures and Motions of Loops in Protein Molecules

A hybrid approach to protein folding problem integrating constraint programming with local search

Classification of co-expressed genes from DNA regulatory regions

Approximate protein folding in the HP side chain model on extended cubic lattices