Abstract

BackgroundIn recent years, the number of High Throughput Screening (HTS) assays deposited in PubChem has grown quickly. As a result, the volume of both the structured information (i.e. molecular structure, bioactivities) and the unstructured information (such as descriptions of bioassay experiments), has been increasing exponentially. As a result, it has become even more demanding and challenging to efficiently assemble the bioactivity data by mining the huge amount of information to identify and interpret the relationships among the diversified bioassay experiments. In this work, we propose a text-mining based approach for bioassay neighboring analysis from the unstructured text descriptions contained in the PubChem BioAssay database.ResultsThe neighboring analysis is achieved by evaluating the cosine scores of each bioassay pair and fraction of overlaps among the human-curated neighbors. Our results from the cosine score distribution analysis and assay neighbor clustering analysis on all PubChem bioassays suggest that strong correlations among the bioassays can be identified from their conceptual relevance. A comparison with other existing assay neighboring methods suggests that the text-mining based bioassay neighboring approach provides meaningful linkages among the PubChem bioassays, and complements the existing methods by identifying additional relationships among the bioassay entries.ConclusionsThe text-mining based bioassay neighboring analysis is efficient for correlating bioassays and studying different aspects of a biological process, which are otherwise difficult to achieve by existing neighboring procedures due to the lack of specific annotations and structured information. It is suggested that the text-mining based bioassay neighboring analysis can be used as a standalone or as a complementary tool for the PubChem bioassay neighboring process to enable efficient integration of assay results and generate hypotheses for the discovery of bioactivities of the tested reagents.

Highlights

  • In recent years, the number of High Throughput Screening (HTS) assays deposited in PubChem has grown quickly

  • The number of High Throughput Screening (HTS) assays deposited in PubChem [1] has grown quickly in recent years

  • With the rapid growth of the PubChem BioAssay database, the ability to pool such unstructured information from related biological tests together has become increasingly important for getting insights into biological processes

Read more

Summary

Introduction

The number of High Throughput Screening (HTS) assays deposited in PubChem has grown quickly. The PubChem database currently provides four methods of identifying bioassay relationships, which are based on 1) target information, 2) commonly tested active compounds, 3) commonly participated biological pathways, and 4) depositor annotations respectively [1]. There are various limitations of the existing methods, as they depend on the unambiguous identification of either the sequence information or the molecular pathways of the assay targets, or otherwise depend on the provision of comprehensive annotations by depositors, which is lacking in many bioassay records. There is a great amount of meaningful information stored as unstructured free text in the bioassay descriptions which is not being utilized by the existing neighboring approaches (such as objectives of the assays and detailed information about the experimental protocols).

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call