Abstract

Millions of compounds which exist in huge datasets are represented using Simplified Molecular-Input Line- Entry System (SMILES) representation. Fragmenting SMILES strings into overlapping substrings of a defined size called LINGO Profiles avoids the otherwise time-consuming conversion process. One drawback of this process is the generation of numerous identical LINGO Profiles. Introduced by Kristensen et al, the inverted indexing approach represents a modification intended to deal with the large number of molecules residing in the database. Implementing this technique effectively reduced the storage space requirement of the dataset by half, while also achieving significant speedup and a favourable accuracy value when performing similarity searching. This report presents an in-depth analysis of results, with conclusions about the effectiveness of the working prototype for this study.

Highlights

  • Rapid advances in technology over the past few years have allowed for many virtual screening experiments to be conducted extensively [1]

  • The query structure itself normally exhibits a potentially useful level of biological activity and might be, for example, a competitor‟s compound or a structurally novel hit from an initial highthroughput screening (HTS) experiment [7]

  • The Simplified Molecular-Input Line-Entry System (SMILES) specialized algorithm known as LINGO [11] is introduced in the field as it delivers a required level of simplicity for retrieving the molecules from database

Read more

Summary

Introduction

Rapid advances in technology over the past few years have allowed for many virtual screening experiments to be conducted extensively [1]. The query structure itself normally exhibits a potentially useful level of biological activity and might be, for example, a competitor‟s compound or a structurally novel hit from an initial highthroughput screening (HTS) experiment [7]. Both the query and database molecules are characterized by descriptors. The search for compounds similar to a given target ligand structure and compounds with defined biophysical profiles are two main important principles in modern drug discovery process [21] Both tasks make use of molecular descriptors with different complexity (atomic, topographic, sub structural fingerprints, 3D, biophysical properties, etc.) leading to different representations of the same molecule [22]. SMILES-based kernels were found to be computationally faster and more flexible than their 2D competitors

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call