Surface-enhanced Raman spectroscopy (SERS) has been demonstrated as an effective method for elucidating secondary structural characteristics of DNA. However, the inherent complexity of the DNA conformation and the lack of SERS samples pose challenges for identifying numerous secondary structures. To address these issues, a synergistic method integrating machine learning with SERS was proposed so as to analyze the SERS spectra of 54 well-defined conformational oligonucleotides, namely, G-quadruplex (G4), i-motif (iM), double-strand (DS), and single-strand (SS) configurations. Principal component analysis (PCA) effectively segregated the oligonucleotides into three distinct conformational groups (G4s, iMs, and others). Furthermore, linear discriminant analysis (LDA), K-nearest neighbor (KNN), and support vector machine (SVM) approaches were utilized to improve the typing accuracy of 54 trained sequences. This enabled the correct classification of the structures of five untrained sequences, as well as the identification of the predominant conformations including G4, iM, and DS formed by two complementary G-rich and C-rich sequences in acidic and neutral pH conditions. The results of this study demonstrated the potential of the proposed methodology for rapid screening and prediction of secondary DNA conformations.
Read full abstract