Our previous proteomics analysis of small proteins expressed in human K562 cells provided the first direct evidence of translation of upstream ORFs in human full-length cDNAs (Oyama, M., Itagaki, C., Hata, H., Suzuki, Y., Izumi, T., Natsume, T., Isobe, T., and Sugano, S. (2004) Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. Genome Res. 14, 2048-2052). In the present study, we performed an in-depth proteomics analysis of human K562 and HEK293 cells using a two-dimensional nano-liquid chromatography-tandem mass spectrometry system. The results led to the identification of eight protein-coding regions besides 197 small proteins with a theoretical mass less than 20 kDa that were already annotated coding sequences in the curated mRNA database. In addition to the upstream ORFs in the presumed 5'-untranslated regions of mRNAs, bioinformatics analysis based on accumulated 5'-end cDNA sequence data provided evidence of novel short coding regions that were likely to be translated from the upstream non-AUG start site or from the new short transcript variants generated by utilization of downstream alternative promoters. Protein expression analysis of the GRINL1A gene revealed that translation from the most upstream start site occurred on the minor alternative splicing transcript, whereas this initiation site was not utilized on the major mRNA, resulting in translation of the downstream ORF from the second initiation codon. These findings reveal a novel post-transcriptional system that can augment the human proteome via the alternative use of diverse translation start sites coupled with transcriptional regulation through alternative promoters or splicing, leading to increased complexity of short protein-coding regions defined by the human transcriptome.
Read full abstract