Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

T Alexander Dececchi,Paula M Mabee,James P Balhoff,Hilmar Lapp

doi:10.1093/sysbio/syv031

Abstract

The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, that is, cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships.

Highlights

Unlike molecular data, phenotypic data are notoriously time-consuming and complex to observe, classify, and code (Burleigh et al 2013)
Anatomical entities are represented by terms from the comprehensive Uberon anatomy ontology for metazoan animals (Mungall et al 2012; Haendel et al 2014), which was derived in part from independently developed vertebrate multispecies ontologies
The phenotypic features that characterize and define evolutionary groups are currently scattered across the dispersed literature of comparative biology, often in character-by-taxon matrices for small sets of taxa

Summary

Introduction

Phenotypic data are notoriously time-consuming and complex to observe, classify, and code (Burleigh et al 2013) They are described in a highly detailed free-text format in a distributed literature and have not been available in a computable format (Deans et al 2012). In the case of these qualities, presence and absence must be inferred; from the description “posterior flap of adipose fin, free from back and caudal fin” (Lundberg 1992), the adipose fin would be assumed present Such detailed data, originally collected for phylogenetic reconstruction or taxonomic identification, are desirable for re-use at the more general level of presence and absence where they pertain to broader questions concerning, for example, homoplasy, rates, and correlations of phenotype with environment, geography, and genes. The methods described here allow aggregation of phenotypic data into synthetic supermatrices, and show the need to more broadly adopt the use of ontology annotation in the morphological literature to facilitate linking and integration with other data such as genetic

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Systematic Biology	Publication Date: May 26, 2015
Citations: 85	License type: cc-by

R Discovery Prime

R Discovery Prime

Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systematic Biology

Lead the way for us

Similar Papers

Top secret from the bottom up
Craig R Scott ... Soeyoon Choi
Corporate Communications: An International Journal | VOL. 22
Craig R Scott, et. al.Craig R Scott ... Soeyoon Choi
02 Oct 2017
Corporate Communications: An International Journal | VOL. 22

The Theory of Value-Based Payment Incentives and Their Application to Health Care.
Douglas A Conrad
Health Services Research | VOL. Suppl 50 2
Douglas A ConradDouglas A Conrad
09 Nov 2015
Health Services Research | VOL. Suppl 50 2

The Influences of User Experience, Aesthetics and Psychology in the Design Process of 3D Avatars (Theoretical model)
Thomas Photiadis ... Nicos Souleles
Journal For Virtual Worlds Research | VOL. 8
Thomas Photiadis, et. al.Thomas Photiadis ... Nicos Souleles
01 Mar 2015
Journal For Virtual Worlds Research | VOL. 8

The PSICHE framework for sustainable consumption and future research directions
Jorge Nascimento ... Sandra Maria Correia Loureiro
EuroMed Journal of Business | VOL. 19
Jorge Nascimento, et. al.Jorge Nascimento ... Sandra Maria Correia Loureiro
16 Sep 2022
EuroMed Journal of Business | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systematic Biology