Your Spreadsheets Can Be FAIR: A Tool and FAIRification Workflow for the eNanoMapper Database.

Nikolay Kochev,Peter Ritchie,Vesselina Paskaleva,Nina Jeliazkova,Vedrin Jeliazkov,Luchesar Iliev,Gergana Tancheva

doi:10.3390/nano10101908

Abstract

The field of nanoinformatics is rapidly developing and provides data driven solutions in the area of nanomaterials (NM) safety. Safe by Design approaches are encouraged and promoted through regulatory initiatives and multiple scientific projects. Experimental data is at the core of nanoinformatics processing workflows for risk assessment. The nanosafety data is predominantly recorded in Excel spreadsheet files. Although the spreadsheets are quite convenient for the experimentalists, they also pose great challenges for the consequent processing into databases due to variability of the templates used, specific details provided by each laboratory and the need for proper metadata documentation and formatting. In this paper, we present a workflow to facilitate the conversion of spreadsheets into a FAIR (Findable, Accessible, Interoperable, and Reusable) database, with the pivotal aid of the NMDataParser tool, developed to streamline the mapping of the original file layout into the eNanoMapper semantic data model. The NMDataParser is an open source Java library and application, making use of a JSON configuration to define the mapping. We describe the JSON configuration syntax and the approaches applied for parsing different spreadsheet layouts used by the nanosafety community. Examples of using the NMDataParser tool in nanoinformatics workflows are given. Challenging cases are discussed and appropriate solutions are proposed.

Highlights

The nanotechnology field is an increasingly dynamic area in materials science research and development, introducing novel materials with unique properties due to their size in the range of nanometers [1]
The analysis of data and metadata is an iterative process, requiring consultations with domain experts to explain the file content and layout, specifics of the assay, providing links to protocols and Standard Operating Procedures (SOP), and confirming correct ontology annotations of free text found in the files
The major advantage of the eNanoMapper/Ambit data model is a well-defined semantics, which have already been used to integrate large and diverse nanosafety data, with only minor enhancements/improvements related to annotation and arrangement of experiment entries introduced

Summary

Introduction

The nanotechnology field is an increasingly dynamic area in materials science research and development, introducing novel materials with unique properties due to their size in the range of nanometers [1]. Nanomaterials are used across almost all industrial sectors [2] where the “beautiful” properties of nanomaterials inspire widespread usage, and raise questions and concerns about their influence on human health and on the environment [3]. Like surface area, influence various bio interactions and health effects such as: entry to the blood stream; translocation in cells, tissues, and functional organelles; raising the risk of toxic effects, such as inflammation, genotoxicity, autophagy, neurotoxicity, and cell death [4]. The field of nanoinformatics has rapidly developed to provide data driven solutions in the area of nanomaterials safety [5]. Database technologies are extensively used for storing NM information, enabling query and analysis of chemical and physical properties, bio assay experiments, and impacts on humans and nature, especially in the context of nanosafety and risk assessment

Methods

Results

Discussion

Conclusion