Abstract
Cancer is the second leading cause of death globally, and use of therapeutic peptides to target and kill cancer cells has received considerable attention in recent years. Identification of anticancer peptides (ACPs) through wet-lab experimentation is expensive and often time consuming; therefore, development of an efficient computational method is essential to identify potential ACP candidates prior to in vitro experimentation. In this study, we developed support vector machine- and random forest-based machine-learning methods for the prediction of ACPs using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties. We trained our methods using the Tyagi-B dataset and determined the machine parameters by 10-fold cross-validation. Furthermore, we evaluated the performance of our methods on two benchmarking datasets, with our results showing that the random forest-based method outperformed the existing methods with an average accuracy and Matthews correlation coefficient value of 88.7% and 0.78, respectively. To assist the scientific community, we also developed a publicly accessible web server at www.thegleelab.org/MLACP.html.
Highlights
Cancer is a heterogeneous group of several complex diseases, rather than a single disease, which is characterized by uncontrolled cell growth and the ability to rapidly spread or invade other parts of the body
We developed support vector machine- and random forest-based machine-learning methods for the prediction of Anticancer peptide (ACP) using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties
We developed ML-based methods [support vector machine (SVM) and random forest (RF); named Support vector machine based anticancer peptide prediction (SVMACP) and Random forest based anticancer peptide prediction (RFACP), respectively] to predict ACPs (MLACP) using combinations of features calculated from the peptide sequence, including amino acid composition (AAC), dipeptide composition (DPC), atomic composition (ATC), and physicochemical properties (PCP)
Summary
Cancer is a heterogeneous group of several complex diseases, rather than a single disease, which is characterized by uncontrolled cell growth and the ability to rapidly spread or invade other parts of the body. Existing methods separately use properties, such as amino acid composition (AAC), binary profile, dipeptide composition (DPC), and Chou’s pseudo-amino acid composition (PseAAC), extracted from the primary sequence as input features to a support vector machine (SVM) for the development of a prediction model. All of these methods use the same machine-learning (ML) method, with the two methods [that of Hajisharifi et al (2014) and iACP] using the same dataset for prediction-model development. We developed a web tool to assist the scientific community working in the field of ACP therapeutics and biomedical research
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.