Abstract

Early stage experimental data in structural biology is generally unmaintained and inaccessible to the public. It is increasingly believed that this data, which forms the basis for each macromolecular structure discovered by this field, must be archived and, in due course, published. Furthermore, the widespread use of shared scientific facilities such as synchrotron beamlines complicates the issue of data storage, access and movement, as does the increase of remote users. This work describes a prototype system that adapts existing federated cyberinfrastructure technology and techniques to significantly improve the operational environment for users and administrators of synchrotron data collection facilities used in structural biology. This is achieved through software from the Virtual Data Toolkit and Globus, bringing together federated users and facilities from the Stanford Synchrotron Radiation Lightsource, the Advanced Photon Source, the Open Science Grid, the SBGrid Consortium and Harvard Medical School. The performance and experience with the prototype provide a model for data management at shared scientific facilities.

Highlights

  • The field of structural biology provides atomic-scale models of macromolecules

  • The trial of the prototype system consisted of configuring Stanford Synchrotron Radiation Lightsource (SSRL) and Northeast Collaborative Access Team (NE-CAT) as Globus Online service (GO) endpoints, setting up the necessary X.509 authentication system, and mapping grid identities to user identities at the participating sites

  • Users requested grid accounts through the SBGrid Science Portal, which automatically registered them into the SBGrid virtual organizations (VOs), and created a proxy certificate with the National Center for Supercomputing Applications (NCSA) MyProxy server

Read more

Summary

Introduction

The field of structural biology provides atomic-scale models of macromolecules While these models are typically made public through the Protein Data Bank (PDB; Berman, 2000), the source experimental data used to establish the models is generally not published. Advances in the technology and automation at these shared facilities are producing higher data rates, with an anticipated need to process terabytes per day in the near future (Soltis et al, 2008). These challenges are similar to those faced by genomics research or high-energy physics: centralized data collection at a shared facility by a large group of users with independent affiliations and collaborations.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call