Discovering short linear protein motif based on selective training of profile hidden Markov models

Tao Song,Hong Gu

doi:10.1016/j.jtbi.2015.03.010

Abstract

Short linear motifs (SLiMs) in proteins are relatively conservative sequence patterns within disordered regions of proteins, typically 3–10 amino acids in length. They play an important role in mediating protein–protein interactions. Discovering SLiMs by computational methods has attracted more and more attention, most of which were based on regular expressions and profiles. In this paper, a de novo motif discovery method was proposed based on profile hidden Markov models (HMMs), which can not only provide the emission probabilities of amino acids in the defined positions of SLiMs, but also model the undefined positions. We adopted the ordered region masking and the relative local conservation (RLC) masking to improve the signal to noise ratio of the query sequences while applying evolutionary weighting to make the important sequences in evolutionary process get more attention by the selective training of profile HMMs. The experimental results show that our method and the profile-based method returned different subsets within a SLiMs dataset, and the performance of the two approaches are equivalent on a more realistic discovery dataset. Profile HMM-based motif discovery methods complement the existing methods and provide another way for SLiMs analysis.

Full Text