Depending on the relative numbers and spatial arrangement of Tryptophan (Trp; W) and Tyrosine (Tyr; Y) residues, different proteins produce distinct autofluorescence (AF) spectral shapes when excited at ∼280 nm. Yet, considering the vast number and heterogeneous forms in nature, visual analysis and precise identification of proteins based on their AF spectra is challenging and further compounded in cases when different proteins produce substantially similar AF spectral shapes. There is, thus, a serious need to develop a methodology to address this problem. The current study proposes a practical technology to quickly identify proteins using machine learning (ML) algorithms based on their AF spectra. Specifically, AF spectra of fifteen different standard proteins of varying origin with distinct structural and Trp/Tyr compositions were recorded; based on the spectral features selected by the Minimum-Redundancy-Maximum-Relevance (mRMR) algorithm, a multiclass Support Vector Machine (SVM) learning model with Radial Basis Function (RBF), Polynomial, and Linear kernels classified the proteins with high accuracy of 99.06%, 99.03%, and 98.29% respectively. Since protein identification is the key to understand biological functions and disease diagnosis, the proposed methodology could offer a viable alternative to and improve the existing protein identification techniques.
Read full abstract