Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets

Adam Krejci,Borivoj Vojtesek,Matej Lexa,Ted R Hupp,Petr Muller

doi:10.1093/bioinformatics/btv522

Adam Krejci, Borivoj Vojtesek + Show 3 more

Open Access

https://doi.org/10.1093/bioinformatics/btv522

Copy DOI

Abstract

Motivation: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins’ surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions.Results: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible.Availability and implementation: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases.Contact: muller@mou.czSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

Molecular interactions between proteins occur ubiquitously in cells and play central roles in most biological processes
Tools mentioned above perform well on smaller datasets of up to thousands of sequences, they have not been designed to process datasets orders of magnitude larger. We address this issue by introducing Hammock, a novel software tool for peptide sequence clustering
To visualize multiple sequence alignments of resulting clusters, we use sequence logos generated by WebLogo 3.4 (Crooks, 2004) throughout the article

Summary

Introduction

Molecular interactions between proteins occur ubiquitously in cells and play central roles in most biological processes. These interactions are often mediated by short linear motifs located in disordered regions on the surface of one of the interacting partners (Dinkel et al, 2013). Libraries containing very large numbers of such short peptide sequences can be generated and used to discover interaction preferences of proteins. These methods include phage display (Bratkovic, 2009) or other display-based methods, as well as technologies utilizing peptide microarrays (Halperin et al, 2010; Legutki et al, 2010; Stiffler et al, 2007)

Methods

Results

Conclusion