Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics

Leonid L Chepelev,Alexandre Kouznetsov,Alexandre Riazanov,Michel Dumontier,Christopher Jo Baker,Hong Sang Low

doi:10.1186/1471-2105-12-303

Leonid L Chepelev, Alexandre Kouznetsov + Show 4 more

Open Access

https://doi.org/10.1186/1471-2105-12-303

Copy DOI

Abstract

BackgroundThe development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality.ResultsAs part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI) framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of our integrative methodology in the context of high-throughput lipidomics.ConclusionsOur prototype framework is capable of accurate automated classification of lipids and facile integration of lipid class information with additional data obtained with SADI web services. The potential of programming-free integration of external web services through the SADI framework offers an opportunity for development of powerful novel applications in lipidomics. We conclude that semantic web technologies can provide an accurate and versatile means of classification and annotation of lipids.

Highlights

Introduction to methodology and encoding rulesJ Chem Inf Comput Sci 1988, 28(1):31-36. 20
In conjunction with auxiliary SADI services, the integration of inferred class information with (i) information relating to proteins that interact with lipids belonging to the inferred lipid types and (ii) literature references relevant to the class of the lipid under investigation
To assess the quality of our framework, we document the performance of our classification service on a small subset of lipids, namely of eicosanoids, found in the LIPID MAPS database

Summary

Introduction

Introduction to methodology and encoding rulesJ Chem Inf Comput Sci 1988, 28(1):31-36. 20. J Chem Inf Comput Sci 1988, 28(1):. Semantic Web Applications and Tools for Life Sciences; December 10th, 2010; Berlin, Germany 2010 [http://arxiv.org/abs/1012.1666]. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality Lipids and their metabolic derivatives play a crucial role in the biology of many living organisms. Lipidomics generates a large amount of heterogeneous chemical and biochemical data that must be integrated and analyzed in a systematic manner. Such efforts are, hampered by the lack of consistent lipid classification. While the results of classification can be used to annotate chemical entities with class membership information, in this work we shall regard annotation as a distinct activity from classification because we shall use molecular annotation information that is not an outcome of classification in order to enable automated classification of molecular entities

Methods

Results

Discussion

Conclusion