Abstract

BackgroundThe desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported.ResultsThe procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools.ConclusionsWe recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the “supporting information”) should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.

Highlights

  • The desirable curation of 158,122 molecular geometries derived from the National CancerInstitutes (NCI) set of reference mol‐ ecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported

  • Their importance has recently come to the fore with funding agencies in the USA, Europe and Asia all indicating that open deposition of research data will become a mandatory aspect of their funding, and many universities are starting to consider the implications of implementing research data management, or RDM [4,5,6]

  • The configured metadata infrastructures associated with each item in the collection enable individual datafiles to be accessed based only on knowledge of the persistent identifiers and media type, which can be allowed to default to specific type

Read more

Summary

Introduction

The desirable curation of 158,122 molecular geometries derived from the NCI set of reference mol‐ ecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported. Research data repositories based on platforms such as DSpace [1] were introduced about 10 years ago, and their use in domains such as chemistry and molecular sciences has gradually increased [2, 3] Their importance has recently come to the fore with funding agencies in the USA, Europe and Asia all indicating that open deposition of research data will become a mandatory aspect of their funding, and many universities are starting to consider the implications of implementing research data management, or RDM [4,5,6]. An issue frequently raised in the context of research data management relates to the prospects of being able to access and use such digitally held information in the future Recently, such questions were largely directed towards the expected longevity of physical media such as punched cards and floppy disks (both effectively extinct), hard drives, CDROMs, DVDs, magnetic tape etc.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.