Two experimental (log P, R(Mw)) and 17 calculation descriptors for molecular lipophilicity (fragmental, atom-based or based on molecular properties) were investigated by multvariate analysis for a database of 159 compounds including both simple structures as well as more complex drug molecules. Principal component analysis (PCA) of the entire database exhibits a clustering of chemical groups; preciseness of clustering corresponds to chemical similarity. Thus, diversity searching in databases might effectively be performed by PCA on the basis of calculated log P. The comparative validity check of experimental and computational procedures by regression analysis and PCA was performed with a chemically balanced, reduced data set (n = 55) representing 11 chemical groups with 5 members each. Regression of experimental descriptors (log Poct versus RMW) proves that chromatographic data, obtained under well-defined experimental conditions, can be used as valid substitutes for log P. Regression of calculated versus experimental lipophilicity data shows a superiority of fragmental over atom-based methods and approaches based on molecular properties, as indicated by correlation coefficients, slopes and intercepts. In addition, PCA revealed that fragmental methods (Rekker-type, KOWWIN, KLOGP) sense the compound ranking in log P data to almost the same extent as experimental approaches. For atom-based procedures and CLOGP, both the comparability of absolute values and the sensing of the compound ranking in the database are slightly less. This trend is more pronounced for the methods based on molecular properties, with the exception of BLOGP.
Read full abstract