Abstract
The functions of a protein can be inferred from the molecules that associate with it. A protein-molecules association can be identified from their co-occurrences in biomedical literature. Based on this, recent computational methods predict protein functions, especially with the exponential explosion of biomedical literatures post-genomic era. These methods extract information from the literatures that explicitly describe the functions of proteins. We observe that some molecule terms pertaining protein functions may co-occur implicitly with proteins in biomedical texts. Thus, these recent methods may miss vital information about protein functions that is implicitly mentioned in the literature. To overcome this, we propose an Information Extraction system called PLPF that adopts techniques for predicting the functions of proteins from their both explicit and implicit co-occurrences in biomedical texts with molecule terms pertaining protein functions. It uses a combination of explicit term extraction methods and logic-based implicit term extraction methods. Let $\boldsymbol{t}$ be a functional term that co-occurrence with an unannotated protein $\boldsymbol{p_{u}}$ . PLPF will assign pu the functional category $\boldsymbol{t}$ , if: (1) the co-occurrences of the pair $\boldsymbol{t}-\boldsymbol{p_{u}}$ are explicit and the pair is semantically related bases on the syntactic structures of sentences, or (2) the co-occurrences of the pair $\boldsymbol{t}-\boldsymbol{p_{u}}$ are implicit based on the inference rules of predicate logic. We evaluated PLPF by comparing it experimentally with four existing methods. Results showed marked improvement.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have