NWB Query Engines: Tools to Search Data Stored in Neurodata Without Borders Format.

Petr Ježek,Jeffery L Teeters,Friedrich T Sommer

doi:10.3389/fninf.2020.00027

Petr Ježek, Jeffery L Teeters + Show 1 more

Open Access

https://doi.org/10.3389/fninf.2020.00027

Copy DOI

Abstract

The Neurodata Without Borders (abbreviation NWB) format is a current technology for storing neurophysiology data along with the associated metadata. Data stored in the format is organized into separate HDF5 files, each file usually storing the data associated with a single recording session. While the NWB format provides a structured method for storing data, so far there have not been tools which enable searching a collection of NWB files in order to find data of interest for a particular purpose. We describe here three tools to enable searching NWB files. The tools have different features making each of them most useful for a particular task. The first tool, called the NWB Query Engine, is written in Java. It allows searching the complete content of NWB files. It was designed for the first version of NWB (NWB 1) and supports most (but not all) features of the most recent version (NWB 2). For some searches, it is the fastest tool. The second tool, called “search_nwb” is written in Python and also allow searching the complete contents of NWB files. It works with both NWB 1 and NWB 2, as does the third tool. The third tool, called “nwbindexer” enables searching a collection of NWB files using a two-step process. In the first step, a utility is run which creates an SQLite database containing the metadata in a collection of NWB files. This database is then searched in the second step, using another utility. Once the index is built, this two-step processes allows faster searches than are done by the other tools, but does not enable as complete of searches. All three tools use a simple query language which was developed for this project. Software integrating the three tools into a web-interface is provided which enables searching NWB files by submitting a web form.

Highlights

Effective management of neurophysiology data requires storing the data on disk or in some other medium, and having a method in place to enable efficient search for finding parts in the data that are needed for some purpose
The third tool, called “nwbindexer” is written in Python. It works by first creating an SQLite9 index of content in one or more NWB files, allows searches to be performed on the index
Once the index is built, this enables faster searches than the other two methods. It allows searching all of the table representations described in section 2.2.2, but does not allow searching the entire contents of an NWB file because only a subset of the file is stored in the index

Summary

Introduction

Effective management of neurophysiology data requires storing the data on disk or in some other medium, and having a method in place to enable efficient search for finding parts in the data that are needed for some purpose. The efficiency of methods of data search depend on how the data are stored. For various reasons (including: convenience, a potentially large size of the data, ease of data exchange, compatibility with software tools), currently neurophysiology data are often not stored as a single integrated collection such as a relational database, but instead, data are stored in multiple, independent files. These files are typically organized by experimental sessions, so that data recorded in separate sessions of an experiment are stored in separate files. Examples are: the BrainVision data format used by Brain Products Analyzer; the European Data Format (EDF) (Kemp and Olivan, 2003); standard ASCII used by EEGLab (Delorme and Makeig, 2004); the BDF format, a variation of EDF used in BioSemi products; Spike format (Smith, 2003); Klustakwik (Harris et al, 2000; Rossant et al, 2016); NIX (Stoewer et al, 2014); BIDS (Gorgolewski et al, 2016); NSDF (Ray et al, 2016); and Sonata (Dai et al, 2020)

Methods

Results

Conclusion