Abstract

BackgroundPlant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.ResultsWe developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.ConclusionsThe use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.Electronic supplementary materialThe online version of this article (doi:10.1186/s13007-015-0053-y) contains supplementary material, which is available to authorized users.

Highlights

  • Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies

  • A method for describing phenotypes with a common semantic representation across six plant species We include in the Results a brief description of our method, because this is the first report outlining this type of analysis of phenotypes across multiple reference species in plants

  • We found that all phenotypes in our dataset could be described with the ontologies listed in Table 1, we recognize that our dataset does not encompass the entire breadth of possible plant phenotypes, and additional ontologies and development of existing ontologies will be needed to annotate more diverse phenotypes

Read more

Summary

METHODOLOGY

Anika Oellrich1†, Ramona L Walls2†, Ethalinda KS Cannon, Steven B Cannon, Laurel Cooper, Jack Gardiner, Georgios V Gkoutos, Lisa Harper, Mingze He7, Robert Hoehndorf, Pankaj Jaiswal, Scott R Kalberer, John P Lloyd, David Meinke, Naama Menda, Laura Moore, Rex T Nelson, Anuradha Pujar, Carolyn J Lawrence5,7* and Eva Huala13*

Results
Conclusions
Background
Results and discussion
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.