Abstract

In this paper we study the problem of jointly encoding the amino acid sequence and the secondary structure information of proteins, in the current format in which more and more proteins are stored in Swiss-Prot database. The new method, dubbed ProtCompSecS, combines the compressor ProtComp previously designed only for amino acid sequences, with a dictionary based method, where the dictionary containing the patterns for representing the secondary structure is obtained by suitably processing the Dictionary of Protein Secondary Structure (DSSP) data base. We experimented with protein sequences of 14 complete proteomes. When comparing the performance of ProtCompSecS algorithm with that of ProtComp algorithm, for those sequences that have annotated secondary structure information, it surprisingly appeared that encoding both sequence and secondary structure information is more efficient than encoding the protein sequence alone (without knowledge of the secondary structure). This is a strong argument for claiming that the secondary structure has a high descriptive value for modeling and understanding the primary structure (the amino acid sequence) of a protein.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call