Abstract

Background: Lectins are a diverse group of glycoproteins or glycoconjugate proteins that can be extracted from plants, invertebrates and higher animals. Cancerlectins, a kind of lectins, which play a key role in the process of tumor cells interacting with each other and are being employed as therapeutic agents. A full understanding of cancerlectins is significant because it provides a tool for the future direction of cancer therapy. Objective: To develop an accurate and practically useful timesaving tool to identify cancerlectins. A novel sequence-based method is proposed along with a correlative webserver to access the proposed tool. Methods: Firstly, protein features were extracted in a newly feature building way termed, g-gap tripeptide composition. After which a proposed cascade linear discriminant analysis (Cascade LDA) is used to alleviate the high dimensional difficulties with the Analysis Of Variance (ANOVA) as a feature importance criterion. Finally, Support Vector Machine (SVM) is used as the classifier to identify cancerlectins. Results: The proposed method achieved an accuracy of 91.34% with sensitivity of 89.89%, specificity of 92.48% and an 0.8318 Mathew’s correlation coefficient based on only 13 fusion features in jackknife cross validation, the result of which is superior to other published methods in this domain. Conclusion: In this study, a new method based only on primary structure of protein is proposed and experimental results show that it could be a promising tool to identify cancerlectins. An openaccess webserver is made available in this work to facilitate other related works.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call