Abstract
BackgroundFalse occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Here we use a numerical approach to investigate the random appearance of functional motifs with the aim of addressing biological questions such as: How are organisms protected from undesirable occurrences of motifs otherwise selected for their functionality? Has the random appearance of functional motifs in protein sequences been affected during evolution?ResultsHere we analyse the occurrence of functional motifs in random sequences and compare it to that observed in biological proteomes; the behaviour of random motifs is also studied. Most motifs exhibit a number of false positives significantly similar to the number of times they appear in randomized proteomes (=expected number of false positives). Interestingly, about 3% of the analysed motifs show a different kind of behaviour and appear in biological proteomes less than they do in random sequences. In some of these cases, a mechanism of evolutionary negative selection is apparent; this helps to prevent unwanted functionalities which could interfere with cellular mechanisms.ConclusionOur thorough statistical and biological analysis showed that there are several mechanisms and evolutionary constraints both of which affect the appearance of functional motifs in protein sequences.
Highlights
False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome
The PROSITE database provides, for each entry, complete lists of Swiss-Prot proteins manually verified for true positive (TP), false positive (FP), and false negative (FN) assignments [4]
True and false positives of PROSITE patterns are manually verified by expert curators through both the literature and the information retrieved from other databases such as Swiss-Prot or Pfam [13]
Summary
False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Sternberg [2] assumed the calculated expectations as a benchmark for evaluating motif matches on the Swiss-Prot database as annotated in PROSITE; Nevill-Manning and co-workers [3] used such expectations for assessing the specificity of motifs exhaustively generated from a multiple sequence alignment of related proteins. From this perspective, the number of occurrences of a motif in a set of proteins can be regarded as the sum of the functional occurrences plus the random occurrences, i.e. motif matches explained by the sequence composition alone [6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.