Abstract

Fayyad defined and described the term “data mining” as the “nontrivial extraction of implicit, previously unknown, and potentially useful information from data, or the search for relationships and global patterns that exist in databases.” To extract information from huge quantities of data and to gain knowledge from this information, analysis and exploration have to be performed by automatic or semi-automatic methods. The data mining process can be divided into the following steps: selection, preprocessing, transformation, interpretation, and evaluation. In particular, for the final two steps, the visual representation of data plays a pivotal role. The task of finding a lead by database mining requires the analysis of the relationships between the structure of potential new drugs and their biological activity. Due to the amount of data to be processed, it is advisable to use a hierarchical representation of the chemical structures starting from 1D fingerprints, going further to topological descriptors, such as 2D autocorrelation, and finally, considering 3D structures and molecular surface properties. A structure can be searched for in a database by string matching, as long as each compound has a unique Wiswesser line notation (WLN) or a unique simplified molecular input line system (SMILES) string. The SMILES arbitrary target specification (SMARTS) is based on the SMILES notation and is used to encode a query for substructure searches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call