Formalization, Annotation and Analysis of Diverse Drug and Probe Screening Assay Datasets Using the BioAssay Ontology (BAO)

Uma D Vempati,Kunie Sakurai,Stephan C Schürer,Vance P Lemmon,Ahsan Mir,Saminda Abeyruwan,Magdalena J Przydzial,Ubbo Visser,Caty Chung,Dermot Cox

doi:10.1371/journal.pone.0049198

Uma D Vempati, Kunie Sakurai + Show 8 more

Open Access

https://doi.org/10.1371/journal.pone.0049198

Copy DOI

Journal: PloS one	Publication Date: Nov 14, 2012
Citations: 84	License type: CC BY 4.0

Affiliation: University of Miami, Neurological Surgery

Abstract

Huge amounts of high-throughput screening (HTS) data for probe and drug development projects are being generated in the pharmaceutical industry and more recently in the public sector. The resulting experimental datasets are increasingly being disseminated via publically accessible repositories. However, existing repositories lack sufficient metadata to describe the experiments and are often difficult to navigate by non-experts. The lack of standardized descriptions and semantics of biological assays and screening results hinder targeted data retrieval, integration, aggregation, and analyses across different HTS datasets, for example to infer mechanisms of action of small molecule perturbagens. To address these limitations, we created the BioAssay Ontology (BAO). BAO has been developed with a focus on data integration and analysis enabling the classification of assays and screening results by concepts that relate to format, assay design, technology, target, and endpoint. Previously, we reported on the higher-level design of BAO and on the semantic querying capabilities offered by the ontology-indexed triple store of HTS data. Here, we report on our detailed design, annotation pipeline, substantially enlarged annotation knowledgebase, and analysis results. We used BAO to annotate assays from the largest public HTS data repository, PubChem, and demonstrate its utility to categorize and analyze diverse HTS results from numerous experiments. BAO is publically available from the NCBO BioPortal at http://bioportal.bioontology.org/ontologies/1533. BAO provides controlled terminology and uniform scope to report probe and drug discovery screening assays and results. BAO leverages description logic to formalize the domain knowledge and facilitate the semantic integration with diverse other resources. As a consequence, BAO offers the potential to infer new knowledge from a corpus of assay results, for example molecular mechanisms of action of perturbagens.

Highlights

High-throughput screening (HTS) has become the most common approach to identify starting points for the development of novel drugs [1]
BioAssay Ontology (BAO) formally describes perturbation bioassays, such as small molecule HTS assays, for the purpose of categorizing the assays and the results by concepts that relate to the screening format, design, technology, target, and endpoints and which are essential to interpret screening results in the context of a molecular relationships were created to connect the classes and develop a knowledge representation of the biological assays and screening outcomes
‘Measure group’ is a class created to group experimental outcomes into result sets and enables the modeling of multiplexed and multi-parametric assays

Summary

Introduction

High-throughput screening (HTS) has become the most common approach to identify starting points for the development of novel drugs [1]. The Molecular Libraries Probe Production Centers Network (MLPCN), which is part of the NIH Molecular Libraries initiative, offers researchers ‘‘access to the large-scale screening capacity, along with medicinal chemistry and informatics necessary to identify chemical probes to study the functions of genes, cells, and biochemical pathways’’ [2]. An example of a very recent large-scale public screening effort is the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program, which aims to develop a library of molecular signatures based on gene expression and other cellular changes in response to perturbing agents across a variety of cell types using various high-throughput screening approaches [4]. Other public resources to access screening data include ChEMBL, a database that contains structure-activity relationship (SAR) data curated from the medicinal chemistry literature [5] and the Psychoactive Drug Screening Program (PDSP), which generates data from screening novel psychoactive compounds for pharmacological activity [6]. Private resources, such as Collaborative Drug Discovery (CDD) [8], make large screening datasets publicly accessible

Objectives

Methods

Results

Conclusion