Abstract

High-throughput cell-data technologies such as single-cell RNA-seq create a demand for algorithms for automatic cell classification and characterization. There exist several cell classification ontologies with complementary information. However, one needs to merge them to synergistically combine their information. The main difficulty in merging is to match the ontologies since they use different naming conventions. Therefore, we developed an algorithm that merges ontologies by integrating the name matching between class label names with the structure mapping between the ontology elements based on graph convolution. Since the structure mapping is a time consuming process, we designed two methods to perform the graph convolution: vectorial structure matching and constraint-based structure matching. To perform the vectorial structure matching, we designed a general method to calculate the similarities between vectors of different lengths for different metrics. Additionally, we adapted the slower Blondel method to work for structure matching. We implemented our algorithms into FOntCell, a software module in Python for efficient automatic parallel-computed merging/fusion of ontologies in the same or similar knowledge domains. FOntCell can unify dispersed knowledge from one domain into a unique ontology in OWL format and iteratively reuse it to continuously adapt ontologies with new data endlessly produced by data-driven classification methods, such as of the Human Cell Atlas. To navigate easily across the merged ontologies, it generates HTML files with tabulated and graphic summaries, and interactive circular Directed Acyclic Graphs. We used FOntCell to merge the CELDA, LifeMap and LungMAP Human Anatomy cell ontologies into a comprehensive cell ontology. We compared FOntCell with tools used for the alignment of mouse and human anatomy ontologies task proposed by the Ontology Alignment Evaluation Initiative (OAEI) and found that the Fβ alignment accuracies of FOntCell are above the geometric mean of the other tools; more importantly, it outperforms significantly the best OAEI tools in cell ontology alignment in terms of Fβ alignment accuracies.

Highlights

  • Precision biomedicine technologies produce overgrowing quantities of information from of high throughput data from finer-grained biomedical samples reaching single-cell (Hwang et al, 2018) and subcellular (Grindberg et al, 2013) levels that allow to discover new cell types (Boldog et al, 2018; Gerovska and Araúzo-Bravo, 2019; Sas et al, 2020)

  • To find the optimal parameters of FOntCell for the merging of CELDA with LifeMap, we performed a bidimensional scanning of the alignment parameters: local name threshold θLN and window length, W, in the range [0.1, 0.8] and [1, 8], respectively, using steps of 0.1 for θLN, and 1 for W for all structure mapping metrics: the three vectorial structure matching methods (Euclidean, Pearson, and cosine), the constraint-based structure matching, and the Blondel structure matching (Figure 3A)

  • We selected the best performing tools that we found during the alignment of mouse and human anatomy ontologies (Table 3): StringEquiv, AML and LogMap, and ran them with their default parameters to compare their performance with FOntCell in the case of the alignment of CELDA and LifeMap

Read more

Summary

Introduction

Precision biomedicine technologies produce overgrowing quantities of information from of high throughput data from finer-grained biomedical samples reaching single-cell (Hwang et al, 2018) and subcellular (Grindberg et al, 2013) levels that allow to discover new cell types (Boldog et al, 2018; Gerovska and Araúzo-Bravo, 2019; Sas et al, 2020). This increasingly precise cell data render existing cell classification systems obsolete and create the demand for automatic comprehensive data-driven cell classification methods. In the case of cell ontologies, there are several cell type classifications in various formats; the most frequently used being the Web Ontology Language (OWL) (Smith et al, 2004) format, that encompass the vast majority of Open Biomedical Ontologies (OBO) Foundry (Smith et al, 2007) ontologies

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.