Abstract

Background Functional annotation of genes is an important task in biology since it facilitates the characterization of genes relationships and the understanding of biochemical pathways. The various gene functions can be described by standardized and structured vocabularies, called bio-ontologies. The assignment of bio-ontolgy terms to genes is carried out by means of applying certain methods to datasets extracted from biomedical articles. These methods originate from data mining and machine learning and include maximum entropy or support vector machines (SVM). Purpose The aim of this paper is to propose an alternative to the existing methods for functionally annotating genes. The methodology involves building of classification models, validation and graphical representations of the results and reduction of the dimensions of the dataset. Methods Classification models are constructed by Linear discriminant analysis (LDA). The validation of the models is based on statistical analysis and interpretation of the results involving techniques like hold-out samples, test datasets and metrics like confusion matrix, accuracy, recall, precision and F-measure. Graphical representations, such as boxplots, Andrew's curves and scatterplots of the variables resulting from the classification models are also used for validating and interpreting the results. Results The proposed methodology was applied to a dataset extracted from biomedical articles for 12 Gene Ontology terms. The validation of the LDA models and the comparison with the SVM show that LDA (mean F-measure 75.4%) outperforms the SVM (mean F-measure 68.7%) for the specific data. Conclusion The application of certain statistical methods can be beneficial for functional gene annotation from biomedical articles. Apart from the good performance the results can be interpreted and give insight of the bio-text data structure.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.