Abstract

AbstractIn silico prediction of plant performance is gaining increasing breeders’ attention. Several statistical, mathematical and machine learning methodologies for analysis of phenotypic, omics and environmental data typically use individual or a few data layers. Genomic selection is one of the applications, where heterogeneous data, such as those from omics technologies, are handled, accommodating several genetic models of inheritance. There are many new high throughput Next Generation Sequencing (NGS) platforms on the market producing whole-genome data at a low cost. Hence, large-scale genomic data can be produced and analyzed enabling intercrosses and fast-paced recurrent selection. The offspring properties can be predicted instead of manually evaluated in the field . Breeders have a short time window to make decisions by the time they receive data, which is one of the major challenges in commercial breeding. To implement genomic selection routinely as part of breeding programs, data management systems and analytics capacity have therefore to be in order. The traditional relational database management systems (RDBMS), which are designed to store, manage and analyze large-scale data, offer appealing characteristics, particularly when they are upgraded with capabilities for working with binary large objects. In addition, NoSQL systems were considered effective tools for managing high-dimensional genomic data. MongoDB system, a document-based NoSQL database, was effectively used to develop web-based tools for visualizing and exploring genotypic information. The Hierarchical Data Format (HDF5), a member of the high-performance distributed file systems family, demonstrated superior performance with high-dimensional and highly structured data such as genomic sequencing data.

Highlights

  • New and cheap local sensor techniques as well as advances in remote sensing and geo-information systems provide extensive descriptions of the environmental conditions under which plants grow. This allows in silico prediction of plant performance depending on genotype, environment and crop management

  • Genomics and other omics data were produced in sorghum (Sorghum bicolor (L.) Moench) and tomato (Solanum lycopersicum L.) crops (Fig. 6.2) evaluated in DataBio Genomics pilots; four categories of data were produced including (Tables 6.1 and 6.2): (1) in situ sensors and farm data, (2) genomic data from plant breeding efforts in greenhouses and in open field produced using Generation Sequencers (NGS), (3) biochemical data produced by chromatographs (LC/MS/MS, GS/MS, HPLC), wet chemistry and NIRS (Tables 6.1 and 6.2), and (4) genomics modelling output represented by integrative analytics information

  • Genomics data used in the DataBio project resulted from genomic DNA (Deoxyribonucleic acid) of the plant species of interest resequenced using Illumina sequencing platform consisting of high-throughput Generation sequencers

Read more

Summary

Introduction

The array of techniques for probing complex biological systems such as (crop) plants is continuously expanding, providing unprecedented data on multiple phenotypic layers as well as multiple omics layers (genome, proteome, metabolome, epigenome or methylome, and more). New and cheap local sensor techniques as well as advances in remote sensing and geo-information systems provide extensive descriptions of the environmental conditions under which plants grow. This allows in silico prediction of plant performance (e.g. traits like yield, abiotic and biotic resistance) depending on genotype, environment and crop management. Genomic selection is one of the applications, where heterogeneous data, such as those from genomics, metabolomics and phenomics technologies, are handled accounting for several genetic models of inheritance [1].

Genomic and Other Omics Data in DataBio
Genomic Data Management Systems
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call