Protein remote homology detection represents a crucial and challenging task in bioinformatics: even if effective methods appeared in recent years, in several cases a proper characterization of remote evolutionary correlation can not be derived. In such situations, it may be possible that information derived from other sources helps, provided that it is possible to properly integrate such (even partial) information into existing models. In this paper, we provide some evidence that this route is feasible: inspired by the multimodal retrieval literature, we show how it is possible to exploit a simple multimodal approach to improve a model learned from a set of sequences, by using knowledge derived from a partial set of corresponding 3D structures. We investigate (with the SCOP 1.53 benchmark) the suitability of the proposed multimodal scheme, showing that a beneficial effect can be obtained even when a very reduced amount of structures are available. A further detailed analysis on a member of the GPCR superfamily confirms that this multimodal approach can extract information that cannot be obtained from sequence-based techniques.
Read full abstract