Abstract
Genome sequences are available for increasing numbers of organisms. The proteomes (protein complement expressed by the genome) of some such organisms are being studied with two-dimensional gel electrophoresis, but the identification of thousands of proteins on two-dimensional gels remains a challenge. Recent progress with mass spectrometric and traditional sequencing methods has increased the speed, sensitivity, and ease of protein sequence analysis. Although these methods can be used to produce extensive sequence information, they are also ideal for rapidly generating amino-and carboxy-terminal ‘sequence tags’ of six amino acids or less. To investigate the application of such sequence tags to the identification of proteins separated on two-dimensional gels, we have written a program, TagIdent, to match a protein sequence of up to six amino acids against entries in the SWISS-PROT database. Important features of the program are that it allows the user to specify (optionally) the estimated isoelectric point and mass, one or more species of organism to match against, and whether the sequence data are amino- or carboxy-terminal; in this way searches are highly directed. This is in contrast to BLAST, BLITZ or FASTA, which are global searching tools that either cannot search with very small sequences or return lists containing many irrelevant proteins. TagIdent is available on the world-wide web at http://expasy.hcuge.ch/www/tools.html and results are sent by e-mail. Use of TagIdent with proteins from organisms for which the genome has been completely, or almost completely sequenced shows that sequence tags have surprising specificity. Figure 1 shows that a protein from an Escherichia coli two-dimensional gel, sequenced with rapid Edman degradation for four cycles only [[1]Wilkins MR Ou K Appel RD Sanchez J-C Yan JX Golaz O et al.Rapid protein identification using N-terminal “sequence tag” and amino acid analysis.Biochem Biophys Res Commun. 1996; 221 (96205323): 609-613Crossref PubMed Scopus (65) Google Scholar], was identified from 223 other candidate proteins within the specified windows of isoelectric point (pI) and molecular mass. The identity of the protein was confirmed by using the same sample for amino-acid composition identification. The theoretical ‘identification’ of 50 randomly selected proteins from E. coli using sequence tags of three, four or five amino acids and appropriate pI and mass windows revealed the same trend. At the amino-terminus, 68% of proteins could be uniquely identified with a three amino-acid tag, 90% with four amino acids, and 94% with five amino acids. The remaining proteins were not uniquely identified, but were correctly assigned as members of a family. How accurate is the program, and how widely can it be applied? Accurate identification with sequence tags as described here relies on all proteins from an organism being in sequence databases. In this manner, if only one protein within a given pI and mass range is found with a certain amino- or carboxy-terminal sequence tag, one can be confident that there is no other, as yet undescribed, protein that could otherwise match the tag. In fully sequenced organisms, the procedure is thus self-checking. The specificity of sequence tags may be an issue in larger organisms: whereas there are (for example) 3 200 000 combinations of five amino-acid tags, protein amino termini have biased sequences and many amino termini are shared. However, protein carboxyl termini have almost random sequences (data not shown) so their sequence tags should be more specific. Other factors to consider will be the accuracy of sequence data that can be obtained from proteins purified from two-dimensional gels, and the accuracy of prediction of protein open reading frames in genome/proteome databases. Large-scale protein characterization projects will define the effect of these factors and thus the utility of sequence tags for protein identification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.