Many proteins undergo important post-translational proteolytic processing to remove targeting signals and activation peptides, and most proteins undergo proteolytic inactivation and catabolism. The enzymes that hydrolyse the peptide bonds in proteins and peptides are known as peptidases, proteases or proteolytic enzymes. The MEROPS database ("http://merops.sanger.ac.uk":http://merops.sanger.ac.uk) presents the classification and nomenclature of peptidases, their inhibitors and substrates. In 1993 we proposed the scheme for the classification of peptidases that has been internationally accepted, and in 1996 we established the MEROPS database. Protein inhibitors have been included in the database since 2004. About 2% of the genes in a genome encode peptidase homologues, and a further 1% encode protein inhibitors. For example, the human genome has 1037 genes encoding peptidase homologues (of which 643 are known or predicted to be active peptidases) and 433 protein inhibitor genes (of which 144 have been biochemically characterized as inhibitors). The MEROPS classification is hierarchical. Sequences are grouped into a peptidase species (each of which is given a unique identifier, for example C01.060 for cathepsin B); peptidase species are grouped into a family (for example C1); and families grouped into a clan (for example CA). To be included in the same protein species, sequences must be derived from the same node on a dendrogram derived from the family sequence alignment and known (or predicted) to share similar specificity. To be included in the same family sequences must be homologous over the sequence domain that contains the active site residues (peptidases) or reactive site (inhibitors). To be included in the same clan, the proteins must share similar tertiary structures (or the same linear arrangement of active site residues if the structure is unknown). Over 117,000 peptidase homologues are classified into 3114 protein species, 205 families and 52 clans, and 12,104 protein inhibitors are classified into 663 protein species, 64 families and 33 clans.The database includes manually curated summaries for each clan, family and protein species. There are also sequence alignments and manually curated bibliographies (with over 41,000 references) at every level. In addition to protein inhibitors we also include 158 manually curated summaries for synthetic and naturally occurring small molecule inhibitors. There is also a summary page for each organism listing all known homologues and an analysis highlighting significant presences, absences or gene family expansions for organisms with a completely sequenced genome. The MEROPS database includes known peptidase substrates: naturally occurring peptides and proteins, and synthetic substrates. Currently there are 4091 cleavages of synthetic substrates and 95,413 cleavages of proteins (of which 74,740 are physiological). Cleavages in proteins are mapped to UniProt entries. An alignment of very close homologues of each substrate sequence is shown, highlighting residues around each cleavage site indicating whether the peptidase is known to accept the amino acid at that position or not. Cleavage sites that are conserved are likely to be physiological; cleavage sites that are not conserved may be pathological for the species in which they occur or coincidental.The MEROPS data is freely available to download from our FTP site ("http://ftp.sanger.ac.uk/pub/MEROPS":http://ftp.sanger.ac.uk/pub/MEROPS) and via our Distributed Annotation System (DAS) server ("http://das.sanger.ac.uk/das/merops":http://das.sanger.ac.uk/das/merops).
Read full abstract