Abstract

Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. A lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power, and large delays in delivery of results. We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies, and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets.

Highlights

  • Metabolomics allows the study of small-molecule substrates and compounds that are involved in metabolic processes

  • MetaFetcheR is an R package that uses the sparse input of primary database identifiers as a reference point to retrieve identifiers from other databases

  • The two most widely used representations that are supported by MetaFetcheR include the simplified molecular input line entry system (SMILES) [17] and the IUPAC international chemical identifier (InChI) [18,19] that describe chemical structures using ASCII characters

Read more

Summary

Introduction

Metabolomics allows the study of small-molecule substrates and compounds that are involved in metabolic processes. Pathway enrichment analysis is a widespread analysis approach for metabolomics that requires metabolites to map a predefined set of unique identifiers [4]. In this setup there are several issues that arise when accessing, pre-processing, and analysing metabolite data. Foreign reference identifiers may be missing, making it difficult, sometimes impossible, to find the link between two records of the same metabolite in different databases, while in other cases, the small fraction of reference identifiers that are present might lead to incorrect compounds.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call