Abstract
BackgroundTo improve the outcomes of biological pathway analysis, a better way of integrating pathway data is needed. Ontologies can be used to organize data from disparate sources, and we leverage the Pathway Ontology as a unifying ontology for organizing pathway data. We aim to associate pathway instances from different databases to the appropriate class in the Pathway Ontology.ResultsUsing a supervised machine learning approach, we trained neural networks to predict mappings between Reactome pathways and Pathway Ontology (PW) classes. For 2222 Reactome classes, the neural network (NN) model generated 10,952 class recommendations. We compared against a baseline bag-of-words (BOW) model for predicting correct PW classes. A 5% subset of Reactome pathways (111 pathways) was randomly selected, and the corresponding class recommendations from both models were evaluated by two curators. The precision of the BOW model was higher (0.49 for BOW and 0.39 for NN), but the recall was lower (0.42 for BOW and 0.78 for NN). Around 78% of Reactome pathways received pertinent recommendations from the NN model.ConclusionsThe neural predictive model produced meaningful class recommendations that assisted PW curators in selecting appropriate class mappings for Reactome pathways. Our methods can be used to reduce the manual effort associated with ontology curation, and more broadly, for augmenting the curators’ ability to organize and integrate data from pathway databases using the Pathway Ontology.
Highlights
To improve the outcomes of biological pathway analysis, a better way of integrating pathway data is needed
Previous studies have described the differences that exist between pairs of pathway databases [8,9,10,11], and in our prior work, we have categorically summarized ways in which pathway representations have been found to differ between many common pathway databases [12]
We propose and implement a supervised learning framework for inferring mappings between pathways from pathway databases and the Pathway Ontology (PW), with a goal of reducing the hours associated with manual curation
Summary
To improve the outcomes of biological pathway analysis, a better way of integrating pathway data is needed. The same or a similar pathway may be represented in multiple databases Metaresources such as Pathway Commons [4] and ConsensusPathDB [5] allow for querying and access to pathways from different databases, but lack the ability to collapse redundant pathways between databases. Other resources such as PathCards [6] or ReCiPa [7] use statistical methods to detect gene overlap between two pathways, merging pathways with significant overlapping entities into superpathways to reduce membership redundancy. These methods fail to retain the functional boundaries of pathways, which are crucial for pathway analysis result interpretation, i.e., allowing gene expression differences to be aggregated and interpreted at a functional level
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.