Abstract

The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.

Highlights

  • The development of improved agricultural crops is a critical societal challenge, given current global developments such as population growth, climate and environmental change, and the increasing scarcity of inputs needed for agricultural productivity

  • The use of forward genetics, reverse genetics and “omics” technologies combined with bioinformatics approaches to discover causal genetic loci and variants that determine a Abbreviations: genome-scale knowledge network (GSKN), Genome-scale Knowledge Network; crop-specific knowledge networks (CropNet), Crop knowledge network; RefNet, Reference knowledge network of model species

  • We have developed reproducible workflows to integrate multiple public data sources from crop and model species into genome-scale knowledge networks (GSKN)

Read more

Summary

Introduction

The development of improved agricultural crops is a critical societal challenge, given current global developments such as population growth, climate and environmental change, and the increasing scarcity of inputs (fuel, fertiliser, etc.) needed for agricultural productivity. The generation of the hypotheses that link genotype to phenotype and the identification of the candidate biological pathways, processes and functional genes that could be involved requires the integration of multiple heterogeneous types of information This information is spread across many different databases (Rigden et al, 2016) that can include known records of gene-phenotype links, gene-disease associations, gene expression and co-expression, allelic information and effects of genetic variation, links to scientific literature, homology relations, proteinprotein interactions, gene regulation, protein pathway memberships, gene-ontology annotations, protein-domain information and other domain specific information. These characteristics of life science data make networks, consisting of nodes and links between them, represent a flexible data model that can capture some of the complexity and interconnectedness in the data (Huber et al, 2007)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call