Bioinformatics develops methods to understand biological data and to explain biological processes. The discipline operates on huge data sets and is computationally intensive. The fast growth of the field and its specialization into many subdisciplines makes it hard to search for, find, and to keep track of scientific results relevant for one’s own research. Being able to get knowledge of prior studies, and the data they rely on, is a prerequisite to further one’s own research and to ensure effective progress of bioinformatics as a field. At the same time, scientists’ own research and reputation benefit from the best possible searchableness of their research data. The FAIR data movement draws from this motivation. Before research data can be accessed and reused, however, it must first be found or discovered. For this purpose, the large space of highly diverse research data must be conquered, it must have a high quality of searchableness. To increase searchableness, we have devised a metadata schema to describe the entire field of bioinformatics by a small set of descriptors. Our metadata schema has been inspired by Dublin Core and aims at replicating its success in the domain of bioinformatics. The schema aims at complementing the many metadata schemes used by bioinformaticians in practice by extracting their common core, yielding to a schema that can be used across bioinformatics subdisciplines. Our minimal schema for bioinformatics metadata is complemented by a Web-based annotation tool where such metadata can be provided in an effective, time-saving, and concise manner.
Read full abstract