Abstract

Detection of protein subcellular localization is essential in information extraction from biomolecular texts. There has been a great deal of research on text mining to detect protein subcellular localization information in documents. Previous researches have insisted that linguistic information is useful for identifying the subcellular localizations of proteins. However, previous systems for detecting protein subcellular localizations have used only shallow syntactic parsers, and showed poor recall. Thus, there remains a need to use a deep level of linguistic knowledge to the analysis of text. To improve performance in detecting protein subcellular localization information, this paper proposes a method based on a syntactic dependency tree and WordNet. From the syntactic dependency tree, we construct syntactic paths from a protein to its location candidate. Then, we retrieve syntactic and semantic information from the root, protein subtree and location subtree of each syntactic path. We extract syntactic category and syntactic direction as syntactic information, and synset offset of the WordNet thesaurus as semantic information. According to the information, we extract (protein, localization) pairs. Even with no biomolecular knowledge, our method shows reasonable performance in experiments using Medline abstract data. The experimental results show that our method outperforms previous methods, and the obtained syntactic and semantic information contributes to the improvement of the performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.