Property prediction by similarity of molecular structures—practical application and consistency analysis

Neima Brauner,Mordechai Shacham,Georgi St Cholakov,Roumiana P Stateva

doi:10.1016/j.ces.2005.03.069

Abstract

Linear dependency between vectors of molecular descriptors of various compounds is exploited to obtain high precision structure–structure correlations between a target compound and several predictive compounds. The linear structure–structure correlation is used for property prediction and consistency analysis of property data. Solid, liquid and gas phase properties can be predicted within the experimental error level. This method was applied to straight and branched alkane structures of increasing complexity. Adding more predictive compounds to the model generates more accurate models. However, there is a trade-off between the number of predictive compounds in the structure–structure correlation and the error accumulation in the property prediction stage. The needed number of predictive compounds increases with the complexity of the target structure. For simple targets, such as n-tetradecane, two predictive compounds are sufficient to obtain high precision structure–structure correlation, whereas for complex targets, such as pristane, seven aliphatic predictive compounds were required in order to obtain a medium precision correlation. If property data are available for both target and predictive compounds, the prediction error can serve as a measure of the consistency of the data. In most of the cases studied, the consistency levels obtained for the data taken from the DIPPR and NIST databases were higher than the reliability assigned by these sources. A few examples of inconsistent data are shown and potential causes for the inconsistency are provided. It is believed that the techniques presented will advance the property prediction considerably and property consistency analysis and will help understanding the complex between the molecular structure and the properties of pure compounds.

Full Text