Metadata Quality Research Articles

A new science discipline has emerged within the last decade at the intersection of informatics, computer science and biology: Imageomics. Like most other -omics fields, Imageomics also uses emerging technologies to analyze biological data but from the images. One of the most applied data analysis methods for image datasets is Machine Learning (ML). In 2019, we started working on a United States National Science Foundation (NSF) funded project, known as Biology Guided Neural Networks (BGNN) with the purpose of extracting information about biology by using neural networks and biological guidance such as species descriptions, identifications, phylogenetic trees and morphological annotations (Bart et al. 2021). Even though the variety and abundance of biological data is satisfactory for some ML analysis and the data are openly accessible, researchers still spend up to 80% of their time preparing data into a usable, AI-ready format, leaving only 20% for exploration and modeling (Long and Romanoff 2023). For this reason, we have built a dataset composed of digitized fish specimens, taken either directly from collections or from specialized repositories. The range of digital representations we cover is broad and growing, from photographs and radiographs, to CT scans, and even illustrations. We have added new groups of vocabularies to the dataset management system including image quality metadata, extended image metadata and batch metadata. With the image quality metadata and extended image metadata, we aimed to extract information from the digital objects that can possibly help ML scientists in their research with filtering, image processing and object recognition routines. Image quality metadata provides information about objects contained in the image, features and condition of the specimen, and some basic visual properties of the image, while extended image metadata provides information about technical properties of the digital file and the digital multimedia object (Bakış et al. 2021, Karnani et al. 2022, Leipzig et al. 2021, Pepper et al. 2021, Wang et al. 2021) (see details on Fish-AIR vocabulary web page). Batch metadata is used for separating different datasets and facilitates downloading and uploading data in batches with additional batch information and supplementary files. Additional flexibility, built into the database infrastructure using an RDF framework, will enable the system to host different taxonomic groups, which might require new metadata features (Jebbia et al. 2023). By the combination of these features, along with FAIR (Findable, Accessable, Interoperable, Reusable) principles, and reproducibility, we provide Artificial Intelligence Readiness (AIR; Long and Romanoff 2023) to the dataset. Fish-AIR provides an easy-to-access, filtered, annotated and cleaned biological dataset for researchers from different backgrounds and facilitates the integration of biological knowledge based on digitized preserved specimens into ML pipelines. Because of the flexible database infrastructure and addition of new datasets, researchers will also be able to access additional types of data—such as landmarks, specimen outlines, annotated parts, and quality scores—in the near future. Already, the dataset is the largest and most detailed AI-ready fish image dataset with integrated Image Quality Management System (Jebbia et al. 2023, Wang et al. 2021).

In functional magnetic resonance imaging (fMRI) of the brain the measured signal is corrupted by several (e.g. physiological, motion, and thermal) noise sources and depends on the image acquisition. Imaging at ultrahigh field strength is becoming increasingly popular as it offers increased spatial accuracy. The latter is of particular benefit in brainstem neuroimaging given the small cross-sectional area of most nuclei. However, physiological noise scales with field strength in fMRI acquisitions. Although this problem is in part solved by decreasing voxel size, it is clear that adequate physiological denoising is of utmost importance in brainstem-focused fMRI experiments. Multi-echo sequences have been reported to facilitate highly effective denoising through TE-dependence of Blood Oxygen Level Dependent (BOLD) signals, in a denoising method referred to as multi-echo independent component analysis (ME-ICA). It has not been explored previously how ME-ICA compares to other data-driven denoising approaches at ultrahigh field strength. In the current study, we compared the efficacy of several denoising methods, including anatomical component based correction (aCompCor), Automatic Removal of Motion Artifacts (ICA-AROMA) aggressive and non-aggressive options, ME-ICA, and a combination of ME-ICA and aCompCor. We assessed several data quality metrics, including temporal signal-to-noise ratio (tSNR), delta variation signal (DVARS), spectral density of the global signal, functional connectivity and Shannon spectral entropy. Moreover, we looked at the ability of each method to uncouple the global signal and respiration. In line with previous reports at lower field strengths, we demonstrate that after applying ME-ICA, the data is best post-processed in order to remove spatially diffuse noise with a method such as aCompCor. Our findings indicate that ME-ICA combined with aCompCor and the aggressive option of ICA-AROMA are highly effective denoising approaches for multi-echo data acquired at 7T. ME-ICA combined with aCompCor potentially preserves more signal-of-interest as compared to the aggressive option of ICA-AROMA.

Metadata Quality Research Articles

Related Topics

Articles published on Metadata Quality

Metadata remediation through migration, post-migration or necessary clean-up: A roadmap for success

FAIRness in digital forensics datasets’ metadata – and how to improve it

Completeness degree of publication metadata in eight free-access scholarly databases

Denoising of Geochemical Data using Deep Learning–Implications for Regional Surveys

Data reduction in protein serial crystallography.

A Bayesian Approach to Estimate Maternal Mortality Globally Using National Civil Registration Vital Statistics Data Accounting for Reporting Errors

FAIR+R: Making Clinical Data Reliable Through Qualitative Metadata.

Repairing raw metadata for metadata management

Guidelines on assigning the subjects of theses and dissertations in repositories

Unsupervised Machine Learning Clustering of Seismic and Infrasound Data Quality Metrics

GEOSCOPE Network: 40 Yr of Global Broadband Seismic Data

Notes onthe data quality ofbibliographic records from the MEDLINE database.

Improving Testing of Deep-learning Systems

The impact of data quality monitoring of a multicenter prospective registry of cardiac implantable electronic devices

A Crowdsourcing Recommendation Model for Image Annotations in Cultural Heritage Platforms

Fast and efficient identification of anomalous galaxy spectra with neural density estimation

On Image Quality Metadata, FAIR in ML, AI-Readiness and Reproducibility: Fish-AIR example

Comparing the efficacy of data-driven denoising methods for a multi-echo fMRI acquisition at 7T

ENRICHING LARGE DOCUMENT STORES WITH INTELLIGENT METADATA: A FRAMEWORK FOR EFFECTIVE KNOWLEDGE MANAGEMENT AND APPLIED ANALYTICS

CM-Explorer: Dissecting Data Ingestion Problems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Metadata Quality Research Articles

Related Topics

Articles published on Metadata Quality

Metadata remediation through migration, post-migration or necessary clean-up: A roadmap for success

FAIRness in digital forensics datasets’ metadata – and how to improve it

Completeness degree of publication metadata in eight free-access scholarly databases

Denoising of Geochemical Data using Deep Learning–Implications for Regional Surveys

Data reduction in protein serial crystallography.

A Bayesian Approach to Estimate Maternal Mortality Globally Using National Civil Registration Vital Statistics Data Accounting for Reporting Errors

FAIR+R: Making Clinical Data Reliable Through Qualitative Metadata.

Repairing raw metadata for metadata management

Guidelines on assigning the subjects of theses and dissertations in repositories

Unsupervised Machine Learning Clustering of Seismic and Infrasound Data Quality Metrics

GEOSCOPE Network: 40 Yr of Global Broadband Seismic Data

Notes onthe data quality ofbibliographic records from the MEDLINE database.

Improving Testing of Deep-learning Systems

The impact of data quality monitoring of a multicenter prospective registry of cardiac implantable electronic devices

A Crowdsourcing Recommendation Model for Image Annotations in Cultural Heritage Platforms

Fast and efficient identification of anomalous galaxy spectra with neural density estimation

On Image Quality Metadata, FAIR in ML, AI-Readiness and Reproducibility: Fish-AIR example

Comparing the efficacy of data-driven denoising methods for a multi-echo fMRI acquisition at 7T

ENRICHING LARGE DOCUMENT STORES WITH INTELLIGENT METADATA: A FRAMEWORK FOR EFFECTIVE KNOWLEDGE MANAGEMENT AND APPLIED ANALYTICS

CM-Explorer: Dissecting Data Ingestion Problems