High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

Vahan Simonyan,Yang Pan,Anton Golikov,Ekaterina Osipova,Phuc Vinh Nguyen Lam,Hayley Dingerdissen,Naila Gulzar,Olesja Muravitskaja,Scott Goldweber,Jing Wang,John Torcivia-Rodriguez,Raja Mazumder,Alexey Pschenichnov,Thomas Maudru,Krista Smith,Konstantinos Karagiannis,Alexandre Rostovtsev,Alin Voskanian,Tsung-Jung Wu,Elaine E Thompson,Konstantin Chumakov,Valery Tkachenko,Qing Wan ,Luis V Santana‐Quintero ,Carolyn A Wilson ,William J Faison

doi:10.1093/database/baw022

Abstract

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed.Database URL: https://hive.biochemistry.gwu.edu

Highlights

Many challenges associated with the analysis of extra-large next-generation sequencing (NGS) data result from the size and significance of these datasets
Outputs that can be exported for external analysis or viewed internally through a diverse array of high quality scientific visualizations
High-performance Integrated Virtual Environment (HIVE) facilitates the robust retrieval of NGS data from a variety of sources and the subsequent distributed storage of this data in a highly secure environment

Summary

Introduction

Many challenges associated with the analysis of extra-large next-generation sequencing (NGS) data result from the size and significance of these datasets. A comparative analysis of single nucleotide polymorphisms (SNP) profiles for a family of viruses to find determinants of virulence requires parsing of hundreds of millions of reads, tens of genomes and billions of bases, resulting in terabytes of information. This volume is projected to increase to a petabyte scale in the coming years [1,2,3,4] with similar trends predicted for most major biological databases [5,6]. We use HIVE to both support in-house research using and evaluating NGS, and to perform independent analysis as part of our evaluation of NGS data provided to the agency in support of medical product regulatory submissions

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: Jan 1, 2016
Citations: 69	License type: cc-by

R Discovery Prime

R Discovery Prime

High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

Abstract 1660: Identification of allelic imbalance utilizing heterozygous genotype allele frequencies and intensities
Kyle Chang ... Smruthy Sivakumar
Cancer Research | VOL. 79
Kyle Chang, et. al.Kyle Chang ... Smruthy Sivakumar
01 Jul 2019
Abstract 1660: Identification of allelic imbalance utilizing heterozygous genotype allele frequencies and intensities
Kyle Chang ... Smruthy Sivakumar

A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis.
Kasmika Borah ... Saurav Mallik
Functional & integrative genomics | VOL. 24
Kasmika Borah, et. al.Kasmika Borah ... Saurav Mallik
19 Aug 2024
Functional & integrative genomics | VOL. 24

Comparison of multiple algorithms to reliably detect structural variants in pears
Yueyuan Liu ... Jun Wu
BMC Genomics | VOL. 21
Yueyuan Liu, et. al.Yueyuan Liu ... Jun Wu
20 Jan 2020
BMC Genomics | VOL. 21

Abstract 2280: A comprehensive sample tracking and data processing workflow for next generation sequencing
Chandra Sekhar Pedamallu ... Donald Jackson
Cancer Research | VOL. 81
Chandra Sekhar Pedamallu, et. al.Chandra Sekhar Pedamallu ... Donald Jackson
01 Jul 2021
Abstract 2280: A comprehensive sample tracking and data processing workflow for next generation sequencing
Chandra Sekhar Pedamallu ... Donald Jackson

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database