Abstract
Annotation of the functions of genes and proteins is an essential step in genome analysis. Information extraction techniques have been proposed to obtain the function information of genes and proteins in the biomedical literature. However, the performance of most information extraction techniques of function annotation in the biomedical literature is not satisfactory due to the large variability in the expression of concepts in the biomedical literature. This paper proposes a framework to improve the gene function annotation in the literature by considering both the textual information in the literature and the functions of genes with sequences similar to a target gene. The new framework collects multiple types of evidence as: (i) textual information about gene functions by matching keywords of the gene functions; (ii) gene function information from the known functions of genes with sequences similar to a target gene; and (iii) the prior probabilities of gene functions to be associated with an arbitrary gene by mining the known gene functions from curated databases. A supervised learning method is utilized to obtain the weights for combining the three types of evidence to assign appropriate Gene Ontology terms for target genes. Empirical studies on two testbeds demonstrate that the combination of sequence similarity scores, function prior probabilities and textual information improves the accuracy of gene function annotation in the literature. The F-measure scores obtained with the proposed framework are substantially higher than the scores of the solutions in prior research.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.