Abstract

Substantial text mining efforts are being devoted to detect protein mentions and protein-protein interaction (PPI) relations from scientific articles [1,2]. In this context, the BioCreative challenge showed that the correct identification of the individual interactor proteins is still a challenging task, especially when using full text articles [2]. A systematic analysis of particularities of protein mentions in the context of interaction descriptions was nonetheless missing. Experimental biologists often use specific fusion proteins or protein-tags such as -GST, -His, -Myc, FLAG-, antibodies or fluorescent protein (GFP, YFP, CFP and RFP) tags to detect and visualize interactions. These tags are often mentioned as affixes of the target proteins in the literature. The importance of affixes in biomedical text mining had been addressed in case of affixal negation expressions [3], to consider general posttranslational modifications of proteins [4] and can be observed in trigger verbs used for interaction extraction [5]. We carried out a detailed study on the presence of common affixes belonging to interactor protein mentions in full text sentences considered by database curators as evidential support for experimentally characterized physical protein interactions. Furthermore, we tried to determine whether specific affixes might be useful to detect PPI relevant articles and to correlate affix mentions with particular interaction detection methods. Based on examination of over 3,000 of the previously referred interaction evidence passages we have compiled a collection of 277 interaction relevant affixes (89 suffixes, 176 prefixes and 12 that could be both), which were structured into 36 affix tag classes (26 super-affix and 10 combined or sub-affix classes). Figure 1A shows the frequency of mention of each of the affix tag classes. In the resulting PPI affix dictionary (PPIAD), each affix tag class has been manually linked to experimental qualifiers represented by associated PSI-MI ontology [5] concepts by considering their concept definitions. Additionally, statistical associations of affix tag classes to PSI-MI interaction detection method concepts have been derived through curator-based annotations of the evidence passages. To overcome the limited scope and lexical coverage of terms contained in the PSI-MI ontology we build the BioMethod Lexicon, a collection of experimental method terms important for protein interaction and gene regulation relations, and characterized method term co-mentions with affix tag classes. Within a total set of 6,300 interaction evidence sentences, 1,946 (31 %) mentioned at least one interaction relevant affix, which shows that it is a relatively common feature of interaction descriptions. Using statistical analysis of associations between affix classes and interaction detection method annotations (Chi-square test) we discovered that some of the affix classes showed strong associations to interaction methods, such as between: MI:0096 AF_21 (MI: pull down and PPIAD: gst_pull_down_tag), MI:0676 AF_6 (tandem affinity purification and Tandem_Affinity_Purification_tag), MI:0018 AF_10 (two hybrid and Gal4_tag), MI:0006 AF_4 (anti bait coimmunoprecipitation and Antibody_tag), MI:0055 AF_15 (fluorescent * Correspondence: mkrallinger@cnio.es Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre, Madrid, Spain Full list of author information is available at the end of the article Krallinger et al. BMC Bioinformatics 2010, 11(Suppl 5):O1 http://www.biomedcentral.com/1471-2105/11/S5/O1

Highlights

  • Substantial text mining efforts are being devoted to detect protein mentions and protein-protein interaction (PPI) relations from scientific articles [1,2]

  • Based on examination of over 3,000 of the previously referred interaction evidence passages we have compiled a collection of 277 interaction relevant affixes (89 suffixes, 176 prefixes and 12 that could be both), which were structured into 36 affix tag classes (26 super-affix and 10 combined or sub-affix classes)

  • In the resulting PPI affix dictionary (PPIAD), each affix tag class has been manually linked to experimental qualifiers represented by associated PSI-MI:0096 - AF_21 (MI) ontology [5] concepts by considering their concept definitions

Read more

Summary

Introduction

Substantial text mining efforts are being devoted to detect protein mentions and protein-protein interaction (PPI) relations from scientific articles [1,2]. We carried out a detailed study on the presence of common affixes belonging to interactor protein mentions in full text sentences considered by database curators as evidential support for experimentally characterized physical protein interactions. We tried to determine whether specific affixes might be useful to detect PPI relevant articles and to correlate affix mentions with particular interaction detection methods.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call