Present work deals with generation of virtual samples as mathematical modeling of empirical data on the basis of empirical data. The generated samples were used for development of QSAR model. The method deals with extrapolation of sample vector in such a manner that there is conservation of the empirical data distribution. The data distribution has been judged with statistical parameters. The method was implemented with anticancer activity of Gossypol acetic acid against BCL2 target for colorectal cancer. Considering the virtual samples only for model development, model training showed a regression coefficient for leave one out cross validation as 0.996 with 66 virtual samples, and a regression coefficient with external test set data (51 samples) as 0.993. External test set data which were never used in the virtual sample generation showed predicted regression coefficient value of >0.61. On the basis of QSAR model, nine compounds were suggested as anti-BCL2 active compounds. The suggested compounds were further validated by docking study with Gossypol acetic acid and 'Tetrahydroisoquinoline amide substituted phenyl pyrazole' cocrystallized with chimeric BCL2-XL (PDBID: 2W3L) protein.
Read full abstract