Abstract

Sequences of intracellular and extracellular soluble proteins were analyzed statistically in terms of amino acid composition and residue-pair frequencies. Residue-pair frequencies were calculated for sequential separations from ( n, n + 1) to ( n, n + 5), and converted into scoring parameters. Then, for each test protein, the single-residue and residue-pair parameters were applied to calculate a total score. According to our definition, a protein which yields a positive score is indicative of an intracellular protein, whereas a negative score implies an extracellular one. The parameter set was derived from 894 sequences constituting different protein families in the PIR database, and assessed by application to a test of 379 proteins. The results showed that 88% of intracellular and 84% of extracellular proteins were correctly assigned. The discrimination power was improved by about 8% in comparison with the previous study, which used composition data alone. Segregation of intra/extracellular proteins is also observed by other criteria, such as structural class (intracellular proteins prefer α and α/β types and extracellular proteins prefer β and α + β types). Segregation by sequence was found to be a more reliable procedure for distinguishing intra/extracellular protein than methods using structural class. Possible causes for this segregation by sequence are discussed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call