Anticancer Peptides Classification Using Kernel Sparse Representation Classifier

Ehtisham Fazal,Muhammad Sohail Ibrahim,Imran Naseem,Seongyong Park,Abdul Wahab

doi:10.1109/access.2023.3246927

Abstract

Cancer is one of the most challenging diseases because of its complexity, variability, and diversity of causes. It has been one of the major research topics over the past decades, yet it is still poorly understood. To this end, multifaceted therapeutic frameworks are indispensable. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Anticancer peptides (ACPs) are the most promising treatment option, but their large-scale identification and synthesis require reliable prediction methods, which is still a problem. In this paper, we present an intuitive classification strategy that differs from the traditional <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">black-box method and is based on the well-known statistical theory of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sparse-representation classification (SRC). Specifically, we create over-complete dictionary matrices by embedding the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">composition of the K-spaced amino acid pairs (CKSAAP). Unlike the traditional SRC frameworks, we use an efficient <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">matching pursuit solver instead of the computationally expensive <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">basis pursuit solver in this strategy. Furthermore, the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">kernel principal component analysis (KPCA) is employed to cope with non-linearity and dimension reduction of the feature space whereas the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">synthetic minority oversampling technique (SMOTE) is used to balance the dictionary. The proposed method is evaluated on two benchmark datasets for well-known statistical parameters and is found to outperform the existing methods. The results show the highest sensitivity with the most balanced accuracy, which might be beneficial in understanding structural and chemical aspects and developing new ACPs. The Google-Colab implementation of the proposed method is available on the GitHub page (https://github.com/ehtisham-Fazal/ACP-Kernel-SRC).

Full Text