sORF-encoded peptides (SEPs) refer to proteins encoded by small open reading frames (sORFs) with a length of less than 100 amino acids, which play an important role in various life activities. Analysis of known SEPs showed that using non-canonical initiation codons of SEPs was more common. However, the current analysis of SEP sequences mainly relies on bioinformatics prediction, and most of them use AUG as the start site, which may not be completely correct for SEPs. Chemical labeling was used to systematically analyze the N-terminal sequences of SEPs to accurately define the start sites of SEPs. By comparison, we found that dimethylation and guanidinylation are more efficient than acetylation. The ACN precipitation and heating precipitation performed better in SEP enrichment. As an N-terminal peptide enrichment material, Hexadhexaldehyde was superior to CNBr-activated agarose and NHS-activated agarose. Combining these methods, we identified 128 SEPs with 131 N-terminal sequences. Among them, two-thirds are novel N-terminal sequences, and most of them start from the 11-31st amino acids of the original sequence. Partial novel N-termini were produced by proteolysis or signal peptide removal. Some SEPs' transcription start sites were corrected to be non-AUG start codons. One novel start codon was validated using GFP-tag vectors. These results demonstrated that the chemical labeling approaches would be beneficial for identifying the start codons of sORFs and the real N-terminal of their encoded peptides, which helps better understand the characterization of SEPs.
Read full abstract