Abstract

The National Synchrotron Light Source II operating at Brookhaven National Laboratory since 2014 for the US Department of Energy is one of the newest and brightest storage-ring synchrotron facility in the world. NSLS-II, like other facilities, provides pre-processing of the raw data and some analysis capabilities to its users. We describe the research collaborations and open source infrastructure developed at large instrument facilities such as NSLS-II for the purpose of curating high value scientific data along the early stages of the data lifecycle. Data acquisition and curation tasks include storing experiment configuration, detector metadata, raw data acquisition with infrastructure that converts proprietary instrument formats to industry standards. In addition, we describe a specific effort for discovering sample information at NSLS-II and tracing the provenance of analysis performed on acquired images. We show that curation tasks must be embedded into software along the data life cycle for effectiveness and ease of use, and that loosely defined collaborations evolve around shared open source tools. Finally we discuss best practices for experimental metadata capture in such facilities, data access and the new challenges of scale and complexity posed by AI-based discovery for the synthesis of new materials.

Highlights

  • The National Synchrotron Light Source II (NSLS-II)1 operating at Brookhaven National Laboratory (BNL) since 2014 for the US Department of Energy (DOE) is one of the newest and brightest storage-ring synchrotron facility in the world

  • This paper describes the research collaborations and open source infrastructure developed at large experimental facilities, such as NSLS-II, for the purpose of curating high value scientifc data in the early stages of the data lifecycle

  • In such facilities, curation activities must be embedded into software along the data life cycle and that effective collaborations evolve around shared open source tools

Read more

Summary

Introduction

Experiments conducted by users from the academic, government and industry sectors and facilitated by the beamline scientifc staff acquire data with a range of high throughput detectors. Because today’s experiments are conducted as ensembles or groups of related experiments rather than single experiments, users and their samples tend to move between detectors at the same facility and between facilities, taking advantage of the different characterization methods available across the world. The wide variety of detectors, data acquisition methods, storage systems, and data curation philosophies present signifcant challenges to users and data management support teams at the facilities. These challenges are addressed by using and customizing open source software shared between loosely defned collaborations

Objectives
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.