Abstract
Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information.
Highlights
Future innovation in chemistry, as in all the physical and life sciences, depends on collaboration and interdisciplinary research
We focus primarily on academic research, we acknowledge the vital significance of data management for commercial organizations that depend on chemistry, for example the pharmaceutical industry: to date, drug discovery has been the foremost application of cheminformatics
The Chemical Sciences (CMCS) is the only chemistry project they examined in their survey, they raised several general issues that remain pertinent today, including, but not limited to: rich provenance information can become larger than the data it describes; provenance usability depends on federating descriptive information; coping with missing or deleted data requires further consideration
Summary
Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences. A number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are notable. Some disciplines, chemistry being one, have been slow to recognise the value of sharing and have been reluctant to curate their data and information in preparation for exchanging it. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and focus on the role of metadata, the ontologies on which the emerging chemical Semantic Web will depend. We present our choice of the ‘‘grand challenges’’ for the preservation and sharing of chemical information
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.