The discovery of various protein/receptor targets from genomic research is expanding rapidly. Along with the automation of organic synthesis and biochemical screening, this is bringing a major change in the whole field of drug discovery research. In the traditional drug discovery process, the industry tests compounds in the thousands. With automated synthesis, the number of compounds to be tested could be in the millions. This two-dimensional expansion will lead to a major demand for resources, unless the chemical libraries are made wisely. The objective of this work is to provide both quantitative and qualitative characterization of known drugs which will help to generate "drug-like" libraries. In this work we analyzed the Comprehensive Medicinal Chemistry (CMC) database and seven different subsets belonging to different classes of drug molecules. These include some central nervous system active drugs and cardiovascular, cancer, inflammation, and infection disease states. A quantitative characterization based on computed physicochemical property profiles such as log P, molar refractivity, molecular weight, and number of atoms as well as a qualitative characterization based on the occurrence of functional groups and important substructures are developed here. For the CMC database, the qualifying range (covering more than 80% of the compounds) of the calculated log P is between -0.4 and 5.6, with an average value of 2.52. For molecular weight, the qualifying range is between 160 and 480, with an average value of 357. For molar refractivity, the qualifying range is between 40 and 130, with an average value of 97. For the total number of atoms, the qualifying range is between 20 and 70, with an average value of 48. Benzene is by far the most abundant substructure in this drug database, slightly more abundant than all the heterocyclic rings combined. Nonaromatic heterocyclic rings are twice as abundant as the aromatic heterocycles. Tertiary aliphatic amines, alcoholic OH and carboxamides are the most abundant functional groups in the drug database. The effective range of physicochemical properties presented here can be used in the design of drug-like combinatorial libraries as well as in developing a more efficient corporate medicinal chemistry library.
Read full abstract