Hundreds Of Terabytes Research Articles

SAN (Storage Area Network, a high-speed, block level storage device) based centralized data storage system of SST-1 (Steady State superconducting Tokamak) has envisaged to address the need of availability of SST-1 operation & experimental data centrally for archival as well as retrieval [2]. Considering the initial data volume requirement, ∼10TB (Terabytes) capacity of SAN based data storage system has configured/installed with optical fiber backbone with compatibility considerations of existing Ethernet network of SST-1.The SAN based data storage system has been designed/configured with 3-tiered architecture and GFS (Global File System) cluster file system with multipath support. Tier-1 is of ∼3TB (frequent access and low data storage capacity) comprises of Fiber channel (FC) based hard disks for optimum throughput. Tier-2 is of ∼6TB (less frequent access and high data storage capacity) comprises of SATA based hard disks. Tier-3 will be planned later to store offline historical data. In the SAN configuration two tightly coupled storage servers (with cluster configuration) are working together to achieve increase performance, reliability, distribute workload and provides access to the files from any server regardless of the physical location of the file. Different RAID (Redundant Array of Independent Disks) groups are created with both, FC and SATA hard disks to increase reliability, security and performance of the storage system.The adopted SAN based data storage for SST-1 is a modular, robust, and allows future expandability. The storage modules can be added as and when required without changing the existing storage architecture. The data read/write time of the configured system is adequate enough to cater the present throughput requirements of individual subsystems. Sufficient redundancy in terms of hardware has been incorporated to assure the uninterrupted experimental data availability 24×7 within intranet. The installed storage system can be expanded up to ∼100TB of capacity with the existing controller pair presents in the SAN configuration.

Read full abstract

Abstract To target and personalize cancer therapies to the genomic aberrations present in a particular patient's tumor, researchers need to identify the genes that drive the progression of malignant tumors. This requires analysis of somatic mutations from large samples of patients to identify driver mutations up to the “tail end” of the frequency distribution. Community genomics data sets from the TCGA and ICGC projects represent a valuable resource to which researchers can add their own data to gain statistical power in their analyzes. The current issue to this methodology is the highly fragmented storage of public and private data and the inefficient access to public data. Researchers spend weeks to months downloading hundreds of terabytes of data from central repositories before computations can begin. What is needed is a data “safe haven” where researchers can bring compute to the reference data without the need to incur in bulky data transfers or duplicative storage costs, in an environment that protects the privacy of the patients’ data. In collaboration with the International Cancer genome Consortium, we developed ShareSeq, a genomic data safe haven platform that provides an informatics solution for storing, handling and analyzing protected identifiable genomic data. This resource leverages Annai-GNOS, the technology which we developed to create and manage the CGHub TCGA repository together with UCSC, and that is being used in the ICGC Pan Cancer Analysis of Whole Genomes project, and combines it with a high-performance compute environment and an array of tools to process and analyze genomic data. Built using a walled garden approach, where the data is stored, processed and managed within the security of the system, ShareSeq avoids the complexity of assured end point encryption. GeneTorrent, our fast and secure file transfer mechanism, enables researchers’ private information to be transferred into the walled garden simply and securely to combine it with the public datasets. ShareSeq differs dramatically from the traditional cloud in two features: (i) formal mechanisms and a service level agreement to store protected identifiable genomic data securely and safely, built into the system from the ground up; (ii) the system is specifically designed for genomic computing over large shared data sets supporting common bioinformatics workflow tools; (iii) Fast download and access to raw genomic information and its metadata; and (iv) access controls leveraging federated authentication systems that Data Access Committees utilize to authorize access to the restricted data. ShareSeq is initially hosting raw, normalized, and processed data from the ICGC, but we envision that over time it will host an increasing number of high value reference genomic public datasets and add standards-based interfaces promoted by the Global Alliance of Genomes and Health to allow broader data discovery and sharing. Citation Format: Francisco M. De La Vega, Ying Wu, Tal Shmaya, Thomas Schlumpberger, James Wiley, Akshay Patel, Raja Hayek. A novel data safe haven approach to bring analyses to the International Cancer Genome Consortium data. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr LB-308. doi:10.1158/1538-7445.AM2015-LB-308

Read full abstract

Hundreds Of Terabytes Research Articles

Related Topics

Articles published on Hundreds Of Terabytes

Robust Cell Detection for Large-Scale 3D Microscopy Using GPU-Accelerated Iterative Voting.

Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Task scheduling and virtual resource optimising in Hadoop YARN-based cloud computing environment

Towards Accurate Statistical Analysis of Security Margins: New Searching Strategies for Differential Attacks

TDat: An Efficient Platform for Processing Petabyte-Scale Whole-Brain Volumetric Images.

Backup to the Future

A fully coupled hybrid computational aeroacoustics method on hierarchical Cartesian meshes

Toward large-scale crop production forecasts for global food security

Large scale research data archiving: Training for an inconvenient technology

Archiving and retrieval of experimental data using SAN based centralized storage system for SST-1

The role of geophytes in stone-age hunter-–gatherer diets

Large-scale time-lapse microscopy of Oct4 expression in human embryonic stem cell colonies

Study on Hadoop Cluster

Open Polar Server (OPS)—An Open Source Infrastructure for the Cryosphere Community

NeuroBlocks--Visual Tracking of Segmentation and Proofreading for Large Connectomics Projects.

A Web services accessible database of turbulent channel flow and its use for testing a new integral wall model for LES

Operational experience running Hadoop XRootD Fallback

Data correlation‐based analysis methods for automatic memory forensic

Abstract LB-308: A novel data safe haven approach to bring analyses to the International Cancer Genome Consortium data

Data compression for the first G-APD Cherenkov Telescope

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hundreds Of Terabytes Research Articles

Related Topics

Articles published on Hundreds Of Terabytes

Robust Cell Detection for Large-Scale 3D Microscopy Using GPU-Accelerated Iterative Voting.

Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Task scheduling and virtual resource optimising in Hadoop YARN-based cloud computing environment

Towards Accurate Statistical Analysis of Security Margins: New Searching Strategies for Differential Attacks

TDat: An Efficient Platform for Processing Petabyte-Scale Whole-Brain Volumetric Images.

Backup to the Future

A fully coupled hybrid computational aeroacoustics method on hierarchical Cartesian meshes

Toward large-scale crop production forecasts for global food security

Large scale research data archiving: Training for an inconvenient technology

Archiving and retrieval of experimental data using SAN based centralized storage system for SST-1

The role of geophytes in stone-age hunter-–gatherer diets

Large-scale time-lapse microscopy of Oct4 expression in human embryonic stem cell colonies

Study on Hadoop Cluster

Open Polar Server (OPS)—An Open Source Infrastructure for the Cryosphere Community

NeuroBlocks--Visual Tracking of Segmentation and Proofreading for Large Connectomics Projects.

A Web services accessible database of turbulent channel flow and its use for testing a new integral wall model for LES

Operational experience running Hadoop XRootD Fallback

Data correlation‐based analysis methods for automatic memory forensic

Abstract LB-308: A novel data safe haven approach to bring analyses to the International Cancer Genome Consortium data

Data compression for the first G-APD Cherenkov Telescope