GenoMetric Query Language: a novel approach to large-scale genomic data management.

Marco Masseroli,Francesco Venco,Vahid Jalili,Stefano Ceri,Pietro Pinoli,Abdulrahman Kaitoua,Heiko Muller,Fernando Palluzzi

doi:10.1093/bioinformatics/btv048

Marco Masseroli, Francesco Venco + Show 6 more

Open Access

https://doi.org/10.1093/bioinformatics/btv048

Copy DOI

Abstract

Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Feb 3, 2015
Citations: 83	License type: other-oa

R Discovery Prime

R Discovery Prime

GenoMetric Query Language: a novel approach to large-scale genomic data management.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Data Management for Heterogeneous Genomic Datasets.
Stefano Ceri ... Abdulrahman Kaitoua
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 14
Stefano Ceri, et. al.Stefano Ceri ... Abdulrahman Kaitoua
07 Jun 2016
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 14

Demonstration of GenoMetric Query Language
Stefano Ceri ... Pietro Pinoli
-
Stefano Ceri, et. al.Stefano Ceri ... Pietro Pinoli
17 Oct 2018
17 Oct 2018

RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor
Simone Pallotta ... Silvia Cascianelli
BMC Bioinformatics | VOL. 23
Simone Pallotta, et. al.Simone Pallotta ... Silvia Cascianelli
07 Apr 2022
BMC Bioinformatics | VOL. 23

Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying
Marco Masseroli ... Stefano Ceri
Methods | VOL. 111
Marco Masseroli, et. al.Marco Masseroli ... Stefano Ceri
13 Sep 2016
Methods | VOL. 111

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GenoMetric Query Language: a novel approach to large-scale genomic data management.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics