Similarities in binding cavities attract attention for the prediction and doptimization of ligand selectivity. Glinca and Klebe propose a clustering based on physicochemical properties of the binding site analyzed with Cavbase and conclude that their novel cavity-based method tells more than sequences.5 We agree that protein structures are key in understanding of ligand recognition. Still, we think that sequences can tell a lot, if the focus is shifted away from protein sequences toward substrate sequences. We show that an analysis of protease substrates, inherently containing valuable information about binding site characteristics, can be directly utilized to predict potential off-targets. Selectivity is a central issue in drug design, as drugs frequently hit more than a single target.1 Therefore, molecular modeling aims at the prediction of polypharmacology with different approaches followed. Applied methods include ligand-based and structure-based methods as well as network analyses.2−4 Glinca and Klebe demonstrated recently that similarities in physicochemical characteristics of the binding cavity directly relate to overlapping substrate readout.5 By application to protease test sets they show that their cavity-based approach yields similar results as analysis of ligand data from ChEMBL,6 thereby outperforming a similarity analysis of protease sequences. Hence, they conclude, that “cavities tell more than sequences”. We definitely agree that structural information on the binding site is crucial in the rationalization of substrate recognition. Still, we think that sequence information can contribute significantly to an understanding of substrate specificity, when the focus is shifted from protease sequences toward substrate sequences. A plethora of protease substrate sequences has been deposited in the MEROPS database in recent years.7 They are frequently depicted as sequence logos8 to visualize substrate preferences of proteases. Recently, we showed, how these sequence logos can be utilized to yield a quantitative metric for protease specificity.9 Thereby, we also showed that information on protein sequences only is insufficient to predict protease specificity. Furthermore, similarities in protease substrate recognition can be directly deduced via analysis of sequence logos.10 We expect this approach to complement structure-based comparisons, as substrate sequences inherently contain information on binding site characteristics. Substrate peptides probe protease cavities via similar features as Cavbase11 by binding of hydrophobic and hydrophilic, positively and negatively charged, and aromatic amino acids. We performed a substrate sequence-based similarity analysis of the serine protease test set of Glinca and Klebe. Substrate data was downloaded from MEROPS, normalized to the respective natural abundance of amino acids,12 and converted to vectors containing 20 amino acid probabilities at 8 substrate position P4 to P4′. After normalization, scalar products of these substrate vectors yield pairwise protease similarites ranging from 0 to 1.10 A comparison of all eleven serine proteases in the set yields a heat map depicting similarities in protease substrate recognition (see Figure Figure1).1). Furthermore, a hierarchical clustering based on complete-linkage yielding six clusters was performed as suggested by Glinca and Klebe. Figure 1 Heatmap obtained for clustering of proteases based on similarities in peptide substrates. Deep blue color depicts maximum similarity, whereas red regions show dissimilarity in substrate recognition. Six resulting protease clusters are separated with horizontal ... The resulting protease similarity map and clustering shows pronounced overlap with the cavity-based analysis of Glinca and Klebe. Thus, substrate sequence analysis shows similar discriminative power as an analysis of binding pockets. Urokinase-type (uPA) and tissue-type plasminogen activator (tPA) form a consistent cluster as in the study of Glinca and Klebe. Furhermore, our clustering nicely groups trypsin, thrombin, and factor Xa (FXa), known to show pronounced overlap in substrate recognition of small molecules.13 In conclusion we show that sequences can tell a lot on substrate recognition of proteases, if substrate sequences are considered. We are sure that peptide substrates comprise valuable information on protease recognition and propose their usage for the prediction of off-target effects, thereby complementing structure-based approaches.