As the price of genome sequencing has been rapidly decreasing and can be expected to keep on doing so in the next 10 years, the speed at which new microbial genome sequences become available will increase accordingly. In most genome projects, the first step after acquiring a genome sequence is predicting protein‐encoding open reading frames (pORFs). Small proteins or peptides, loosely defined as less than 50 amino acids, encoded in microbial genomes have been largely underestimated. Recent focused functional genomics efforts have led to the identification of a number of new small proteins encoded in genomes of both Gram‐negative and Gram‐positive bacteria, and fungi (Kastenmayer et al., 2006; Li et al., 2008; Hemm et al., 2010; Hobbs et al., 2010; Bitton et al., 2011). Increasing evidence demonstrates that small proteins participate in a wide array of cellular processes and exhibit great diversity in their mechanisms of action. A recent review (Hobbs et al., 2011) highlights examples of small proteins that, in addition to the well‐conserved small ribosomal proteins, participate in cell signalling or regulation, act as antibiotics and toxins/anti‐toxins, alter membrane features, act as chaperones, stabilize protein complexes or serve as structural proteins (Table 1) (Fig. 1). Table 1 Types and functions of small proteins. Figure 1 Structure of small proteins. Small secreted proteins exhibit diversity in their three‐dimensional structures and can contain unique intramolecular linkages or modified amino acids. For example, the mature form of (A) subtilosin (PDB: 1PXQ) is ... Failure to recognize a pORF encoding a small protein means that these important cell constituents will be missed. Here, we give a brief summary of which problems arise in searching for such encoded small proteins, and what we could do to improve the search process.
Read full abstract