Abstract

GenoVault is a cloud-based repository for handling Next Generation Sequencing (NGS) data. It is developed using OpenStack-based private cloud with various services like keystone for authentication, cinder for block storage, neutron for networking and nova for managing compute instances for the Cloud. GenoVault uses object-based storage, which enables data to be stored as objects instead of files or blocks for faster retrieval from different distributed object nodes. Along with a web-based interface, a JavaFX-based desktop client has also been developed to meet the requirements of large file uploads that are usually seen in NGS datasets. Users can store files in their respective object-based storage areas and the metadata provided by the user during file uploads is used for querying the database. GenoVault repository is designed taking into account future needs and hence can scale both vertically and horizontally using OpenStack-based cloud features. Users have an option to make the data shareable to the public or restrict the access as private. Data security is ensured as every container is a separate entity in object-based storage architecture which is also supported by Secure File Transfer Protocol (SFTP) for data upload and download. The data is uploaded by the user in individual containers that include raw read files (fastq), processed alignment files (bam, sam, bed) and the output of variation detection (vcf). GenoVault architecture allows verification of the data in terms of integrity and authentication before making it available to collaborators as per the user’s permissions. GenoVault is useful for maintaining the organization-wide NGS data generated in various labs which is not yet published and submitted to public repositories like NCBI. GenoVault also provides support to share NGS data among the collaborating institutions. GenoVault can thus manage vast volumes of NGS data on any OpenStack-based private cloud.

Highlights

  • Next-generation sequencing (NGS) platforms are producing enormous volumes of data with the introduction of high throughput technologies [1]

  • In order to improve the ease of access to such datasets, we have developed a user-friendly platform named GenoVault for the retrieval and storage of NGS data

  • In order to improve the ease of access to NGS datasets, we have developed GenoVault with a user-friendly interface for storage and retrieval of genomics data [31]

Read more

Summary

Introduction

Next-generation sequencing (NGS) platforms are producing enormous volumes of data with the introduction of high throughput technologies [1]. Users can upload the genomics sequence data onto the GenoVault using web-based or JavaFX interface along with metadata which is stored in a distributed manner on the OpenStack-based cloud. Various technologies and platforms are used for development of GenoVault like Java, Cloud Computing using OpenStack [29], Object Storage Swift, Web Service, Swing, Struts, JSF etc. Storing data Object-based storage needs to be able to handle high capacity and provide low latency It can be achieved by using hyper scale environments or NAS in a more traditional way. After uploading these files into GenoVault, they were stored along with their corresponding metadata like accession details, sequencing platform, gender of the sample, population details These metadata later aid in retrieval of the subsets as per user-requirements. The data can be downloaded from the IGSR: The International Genome Sample Resource website [47]

Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.