Thyroid carcinoma is with the highest diagnosis rate in the endocrine system, and its main histological subtype is papillary thyroid carcinoma (PTC) accounting for 80% of thyroid malignancies. In recent years, the incidence of thyroid cancer has increased exponentially, and its substantial increase was closely related to the overdiagnosis of papillary microcarcinoma (PMC). Therefore, early and accurate identification of PTC and PMC can prevent patients from over treatment. This study aimed to identify PTC and PMC using Raman spectroscopy. We collected serum Raman spectra from 16 patients with PTC and 31 patients with PMC. Firstly, the collected imbalance data were preprocessed using the synthetic minority over-sampling technique (SMOTE). Then, the equalized data were dimensionality reduced by principal component analysis (PCA). Finally, the processed data were fed into the single decision tree (DT) classifier, as well as the random forest (RF) built on the idea of Boosting ensemble and the Adaptive Boosting (Adaboost) model built on the idea of Bagging ensemble for classification. The classification accuracy of the three models in the testing set were 75.38%, 81.54%, and 84.61%, respectively. Compared with the DT classifier, the accuracy of the models introducing the idea of ensemble learning was enhanced by 6.16% and 9.23%, respectively. The best model was the Adaboost. This result demonstrates that serum Raman spectroscopy combined with an ensemble learning algorithm was feasible in rapidly identifying PTC and PMC. At the same time, the method has great potential for application in the field of clinical diagnosis.
Read full abstract