Prediction of drug toxicity remains a significant challenge and an essential process in drug discovery. Traditional machine learning algorithms struggle to capture the full scope of molecular structure features, limiting their effectiveness in toxicity prediction. Graph Neural Network offers a promising solution by effectively extracting drug features from their molecular graphs. However, existing graph learning algorithms fail to account for the interaction features between graph nodes and the indirect edges connecting them. This paper proposes an enhanced graph Neural Network algorithm that employs multi-view features for each node, capturing the feature interactions between each node and its neighbors. Additionally, the adjacency matrix is preprocessed to handle indirect edge interactions. A pooling technique is then applied to aggregate node features, followed by normalization and an activation layer. To further enhance the proposed algorithm, multi-scale attention is applied to learn graph features at different scales, utilizing weights to understand intricate relationships among node feature vectors. The proposed algorithm is evaluated using eight toxicity datasets, covering binary classification, multi-task multi-class, and regression tasks. For binary classification, the Tox21, AMES, Skin reaction, Carcinogens, and DILI datasets are tested. For multi-task multi-class, the ToxCast dataset is applied, and for regression, the LD50 and hREG datasets are tested. The proposed algorithm is compared with four well-known algorithms including Graph Convolution Network, Graph Attention Network, Graph Isomorphism Network, Enhanced Graph Isomorphism Network, and Graph Total Variation. For the classification task, the proposed algorithm achieves ROC-AUC scores of 0.752 for Tox21, 0.775 for AMES, 0.707 for Skin reaction, 0.845 for Carcinogens, 0.92 for DILI, and 0.691 for the ToxCast dataset. For the regression task, the algorithm attains mean square errors of 0.896 for the LD50 dataset and 0.766 for the hREG dataset. These results demonstrate an improvement over the compared algorithms across all evaluated datasets.
Read full abstract