The correlation of molecular function and protein intrinsic disorder is an important aspect of understanding the relationship between function, sequence and structure. This research was inspired by statistical correlation evaluation method described by Xie et al. (J Proteome Res 6 (2007) 1882–1898, reference study), where the authors analyzed the relationship between structure and function of proteins from Swiss-Prot database and where these functions were described with Swiss-Prot function keywords. In this research, we investigated whether the conclusions from the reference study stand for another dataset with richer functional annotation. We used CAFA3 challenge training dataset where the function was described with terms from Gene Ontology (GO terms). In order to compare the results with the previous work, we associated the GO terms with the corresponding Swiss-Prot function keywords. The results were compared with the reference study by first repeating the analysis with Swiss-Prot function keywords and then by GO terms. We used PONDR VSL2b disorder predictor to label over 66,000 CAFA3 proteins as putatively disordered or ordered. Out of 186 Swiss-Prot keywords (belonging to molecular function type) with more than 20 annotated proteins, we found 47 to be highly order related and 44 highly disorder related. Using the same dataset and annotation constraints, out of 1781 GO term (belonging to molecular function type), we found 746 to be highly order related and 564 highly disorder related. GO term results are presented as interactive graphs displaying complex hierarchical structure of Gene Ontology. Comparison of two functional annotations, GO and Swiss-Prot keywords, showed consistent results in cases when it was possible to map a Swiss-Prot keyword to a corresponding GO term. Because of the small number of such cases, we propose a new method for deriving the missing mappings between Swiss-Prot keywords and GO terms with the highest likelihood by measuring similarity (Jaccard index) between sets of protein annotated by different functions. Comparison with results from the reference study revealed prevalence of binding related functions (disorder related) in the current dataset even though the same functions were not present in previous results.
Read full abstract