AbstractThe Generation Challenge Programme (GCP – "http://www.generationcp.org":http://www.generationcp.org) is a globally distributed crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding. GCP adopted the development paradigm of a ‘model-driven architecture’ to achieve the interoperability and integration of diverse GCP data types that are available through distributed data sources and consumed by end-user data analysis tools. Its objective is to ensure semantic compatibility across the Consortium that will lead to the creation of robust global public goods from GCP research results. The GCP scientific domain model is an object model that encapsulates key crop science concepts and is documented using Unified Modeling Language (see GCP Models on "http://pantheon.generationcp.org/index.php":http://pantheon.generationcp.org/index.php). At the core of the GCP architecture is a scientific domain model, which is heavily parameterized with GCP-indexed ontology terms. The GCP-indexed ontology reuses established international standards where available, converts other publicly available controlled vocabularies into formally managed ontology, and develops novel ontology if no public vocabularies yet exist. General and crop-specific GCP ontologies are being developed by crop teams involving GCP and external scientific experts – in particular, for crop-specific ontology relating to plant anatomy, developmental stage, trait and phenotype for selected GCP crops. Crop ontologies are being developed for chickpea, maize, Musa, potato, rice, sorghum and wheat. The Bioversity crop descriptor lists already loaded into OBO format files provide the primary structure to develop the crop ontologies. Then, terms to be mapped to the ontologies are extracted from the crop databases where trait values have been stored by crop scientists. These sources allow the ontology teams to identify the most commonly used concept names and their interrelations. Experts validate the selection of keywords that will build the controlled vocabulary. These GCP ontologies will allow researchers and end users to query keywords related to traits, plant structure, growth stage, and molecular function, and link them to associated phenotyping and genotyping data sets including data on germplasm, crop physiology, geographic information, genes, QTL, etc. To reach that stage, the crop ontologies will be integrated into the data-entry user interface or data templates as picklists facilitating data annotation and submission of new terms. In addition, the GCP ontologies will be integrated with Plant Ontology (PO) and Gramene (Trait Ontology, TO; Environment Ontology, EO) to develop a common, internationally shared crop trait and anatomy ontology. The team will initiate collaboration with SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology), which proposed to integrate the GCP ontology as a study case.The Open Biomedical Ontologies (OBO) edit tool has been used to develop the ontologies for rice, wheat and maize traits, which are currently available at "http://cropforge.org/projects/gcpontology/":http://cropforge.org/projects/gcpontology/ . The crop-specific work plans and ontologies related to other materials are published at "http://pantheon.generationcp.org":http://pantheon.generationcp.org. The development and curation of general-purpose ontologies will be continued and made available on the Pantheon and CropForge websites.
Read full abstract