Abstract

A proteinpsilas subcellular localization is considered an essential part of the description of its associated biomolecular phenomena. As the volume of biomolecular reports has increased, there has been a great deal of research on text mining to detect protein subcellular localization information in documents. It has been argued that linguistic information, especially syntactic information, is useful for identifying the subcellular localizations of proteins of interest. However, previous systems for detecting protein subcellular localization information used only shallow syntactic parsers, and showed poor performance. Thus, there remains a need to use a full syntactic parser and to apply deep linguistic knowledge to the analysis of text for protein subcellular localization information. In addition, we have attempted to use semantic information from the WordNet thesaurus. To improve performance in detecting protein subcellular localization information, this paper proposes a three-step method based on a full syntactic dependency parser and semantic information. In the first step, we construct syntactic dependency paths from each protein to its location candidate. In the second step, we retrieve root information of the syntactic dependency paths. In the final step, we extract syn-semantic patterns of protein subtrees and location subtrees. From the root and subtree nodes, we extract syntactic category and syntactic direction as syntactic information, and synset offset of the WordNet thesaurus as semantic information. According to the root information and syn-semantic patterns of subtrees, we extract (protein, localization) pairs. Even with no biomolecular knowledge, our method shows reasonable performance in experimental results using Medline abstract data. In fact, our proposed method gave an F-measure of 74.53% for training data and 58.90% for test data, significantly outperforming previous methods, by 12-25%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call