In this paper, we use the ensemble machine learning technique to evaluate the strength of three supervised machine learning algorithms, namely, the random forest regression (RFR), support vector regression (SVR) and the gradient boosting regression (GBR) in the prediction of physical properties of mental disorder drugs with small dataset. The model was implemented on a dataset of neighborhood degree-based topological indices which served as predictor variables and physical properties of the drugs which served as target variables. To compute the neighborhood degree-based indices, we employed an algorithm that utilizes the canonical SmilES notations of the drugs. The ensemble method identifies the neighborhood third Zagreb index (NM3(G)) as an efficient predictor of boiling point, flash point and enthalpy of vaporization. The neighborhood Randic index (NR(G)) provides better prediction for molar refractivity, molar volume and polarizability. In the same vein, the neighborhood sum connectivity index (NSC(G)) is an efficient predictor of surface tension while the neighborhood reciprocal Randic index (NRR(G)) is most effective in the prediction of polar surface area. Furthermore, the comparison of the average performance between the ensemble method and the base models (RFR, SVR, GBR) over the neighborhood topological indices shows efficient performance of the individual models across multiple physical properties of mental disorder drugs, when using the neighborhood topological indices as the predictor or input feature. Overall, this research highlights the combination of three supervised machine learning models in an ensemble environment to mitigating the challenges associated with small datasets when applying machine learning models in QSPR analysis.
Read full abstract