Abstract

Information and computer science fields such as machine learning and graph theory are implemented in chemoinformatics to discover the properties of chemical compounds. This paper presents a new algorithm based on the two-class support vector machine (SVM) model, which has new kernel functions for paths of features, enabling the prediction of chemical compound activity. Initially, we extract all paths of features (star subgraphs) with certain lengths, and we encode them depending on their structure in the graphs. Then, we use these codes to construct two relationship matrices between those paths. These matrices contain common and different sub-paths between paths of stars. The number of sub-paths/paths for each compound is passed to the proposed kernel functions in the two-class SVM to predict the activity of chemical compounds. The relationship matrices created by the proposed algorithm help to reduce the number of features, which improves prediction accuracy. We apply the proposed algorithm with and without feature selection using two benchmark datasets, specifically, the monoamine oxidase (MAO) dataset and the AIDS antiviral screen database of active compound dataset, which have 68 and 2000 chemical compounds, respectively. We perform comparative experiments for the proposed kernel functions and many other two-class SVM prediction methods, and the results before feature selection show prediction accuracies of 94% and 99.5% for MAO and AIDS, respectively. After selection, the prediction accuracies are 96% and 99.5% for MAO and AIDS, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.