ABSTRACT Selection and identification of a subset of compounds from libraries or databases, which are likely to possess a desired biological activity is the main target of ligand-based virtual screening approaches. The main challenge of such approaches is achieving of high recall of active molecules. In this paper we presented fuzzy correlation coefficients (FCC), which is used as a similarity coefficient. The new approach is based on mutually dependent between molecular features, while most common approaches (Tanimoto, Bayesian and other coefficients) based on mutually independent between features. Our experiments have shown that the new coefficient increases the recall of active molecules in high diversity database compared with other correlation coefficients and Tanimoto. Keywords : Correlation coefficients, fingerprint features, similarity search, similarity coefficients, virtual screening. 1. INTRODUCTION Virtual screening (VS) refers to the use of a computer-based method to process compounds from a library or database of compounds in order to identify and select ones that are likely to possess a desired biological activity, such as the ability to inhibit the action of a particular therapeutic target. Selection of molecules with a virtual screening algorithm should yield a higher proportion of active compounds, as assessed by experiment, relative to a random selection of the same number of molecules [1]. Currently, VS becomes widely used in computer-based search for novel lead molecules. Typically, there are two approaches to the general problem: virtual screening by docking, when the 3D structure of the biological target (protein or enzyme) involved in the disease is available, and similarity-based virtual screening, where no information on the protein is necessary, instead, structural information of one or more known (bind to protein) molecules are used as structural query. The screening procedure retrieves molecules from the database according to the molecular similarity principle which states that structurally similar molecules exhibit similar biological activities. Similarity searches are now a standard tool for drug discovery. The idea behind such searches is that, given a compound with an interesting biological activity is compared to other compounds. The basic idea of similarity-based Virtual Screening is a very simple and it was first enunciated explicitly by Johnson and Maggiora [2]; in which Similar Property Principle states that molecules that are structurally similar are likely to have similar properties. The main goal of any system for similarity based screening is to quantify the degree of similarity or resemblance between reference structure (target query or queries) and each of the structures in database that is being screened for both real and virtual screening. A similarity measure requires three components: the molecules’ representation that is used to characterize them when are being compared, the weighting scheme that priorities the importance of various components of these representations and the coefficient that is used to calculate the degree of similarity or relatedness between two structural representations. This paper suggests a new ligand-based VS approach for similarity search. The new approach is based on the relationship between the target’s molecules features and all molecules’ features of in database. In the next section, fuzzy text retrieval method is explained. In section 3, we overviews some related works belong to this area. Materials and methods are discussed in section 4, including our proposed method FCC and all our experiments. In section 5, results were presented including evaluation of the new method based on measurement of recall of active molecules. Finally, our conclusions are presented in section 6.
Read full abstract