Abstract
Due to the overuse of antibiotics, people are worried that existing antibiotics will become ineffective against pathogens with the rapid rise of antibiotic-resistant strains. The use of cell wall lytic enzymes to destroy bacteria has become a viable alternative to avoid the crisis of antimicrobial resistance. In this paper, an improved method for cell wall lytic enzymes prediction was proposed and the amino acid composition (AAC), the dipeptide composition (DC), the position-specific score matrix auto-covariance (PSSM-AC), and the auto-covariance average chemical shift (acACS) were selected to predict the cell wall lytic enzymes with support vector machine (SVM). In order to overcome the imbalanced data classification problems and remove redundant or irrelevant features, the synthetic minority over-sampling technique (SMOTE) was used to balance the dataset. The F-score was used to select features. The Sn, Sp, MCC, and Acc were 99.35%, 99.02%, 0.98, and 99.19% with jackknife test using the optimized combination feature AAC+DC+acACS+PSSM-AC. The Sn, Sp, MCC, and Acc of cell wall lytic enzymes in our predictive model were higher than those in existing methods. This improved method may be helpful for protein function prediction.
Highlights
Bacteria are constantly around us, and bacterial infections have become a major public health problem
The benchmark dataset was generated by Chen et al (2016), The dataset was taken from the Universal Protein Resource (UniProt), using the following steps to collect the sequence: (1) sequences annotated with “Inferred from homology” or “Predicted” were removed
The sensitivity (Sn), Matthew’s correlation coefficient (MCC), and accuracy (Acc) of amino acid composition (AAC) were all higher than dipeptide composition (DC), because DC displays redundant or irrelevant features, so we used “Fscore” to select the feature
Summary
Bacteria are constantly around us, and bacterial infections have become a major public health problem. Ding et al (2009) used Chou’s amphiphilic pseudo to predict cell wall lytic enzymes; the predictive accuracy was 80.40% with jackknife test. Chen et al (2016) developed a predictor called “Lypred” that used pseudo amino acid composition (PseAAC) as a feature vector; the predictive accuracy was 91.3% with fivefold cross-validation. Meng et al (2020) developed a predictor called “CWLy-SVM” that employed the 473-dimensional sequence-based feature descriptor to predict cell wall lytic enzymes; the result was 95.50% with jackknife test. The amino acid composition (AAC), the dipeptide composition (DC), the position-specific score matrix auto-covariance (PSSM-AC), and the Auto-covariance average chemical shift (acACS) were used to predict the cell wall lytic enzymes with the same datasets as investigated by Chen et al (2016). The accuracy (Acc) was 99.19% with a balanced dataset in jackknife test by using the optimized combination feature AAC+DC+PSSM-AC+acACS
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.