Abstract Background Preoperative diagnosis of thyroid lesions by fine-needle aspiration (FNA) biopsy cytology can be challenging, and in up to 20% of cases is unachievable. Patients with an indeterminate preoperative diagnosis are often recommended for diagnostic surgery, with the majority receiving a benign diagnosis, rendering the surgery unnecessary. Consequently, new technologies that can provide accurate preoperative diagnosis of thyroid lesions are needed. Here, we have employed DESI-MS imaging along with statistical modeling to determine molecular signatures of benign and malignant thyroid lesions using banked thyroid tissue samples with known histopathology diagnosis. We then applied this methodology for analysis and classification of preoperatively collected FNA smears. Methods A total of 199 fresh-frozen thyroid tissues including 50 normal, 55 follicular adenoma (FTA), 58 follicular carcinoma (FTC), and 36 papillary carcinoma (PTC) were sectioned at a thickness of 10 µm and stored at −80 °C until analysis. FNA biopsies were prospectively collected at Baylor College of Medicine. DESI-MS imaging was performed in the negative ion mode using a Waters Xevo G2-XS mass spectrometer fitted with a DESI-XS source. Samples were H&E stained and pathologically evaluated after analysis. Molecular profiles from tissue regions of clear histology were used to build classification models. For the FNA smears, mass spectra corresponding to clusters of thyroid cells were extracted for statistical prediction, and the predictive performance of the models on FNA smears was assessed in correlation with pathology. Results DESI-MS analysis of thyroid tissue sections generated 151 317 individual mass spectra, comprised of hundreds of lipid and metabolite features, that were used to build and validate two classification models: PTC vs benign thyroid, comprised of normal thyroid and FTA, and FTC vs benign thyroid. For each of these models, the data was randomly split with two-thirds used as a training set to generate the model and one-third used as an independent validation set to assess the performance of the model. For the PTC vs benign thyroid model, a prediction accuracy of 97.7% was achieved for the training set with an accuracy of 96.6% for the withheld validation dataset. For the FTC vs benign model, an accuracy of 78.4% was achieved for both the training and validation datasets. We are currently refining statistical workflows to improve the performance of the FTC model and to generate a model discriminating benign thyroid from thyroid cancer (FTC and PTC combined). When testing the prediction of the classifiers built from tissue imaging data on an initial sample set of 19 FNAs, overall per-sample prediction accuracies of 79% and 89% were obtained using the PTC vs benign thyroid model and the FTC vs benign thyroid model, respectively. We are currently working to collect, analyze, and validate the performance of the classification models on additional FNA biopsies, with a focus on samples with indeterminate cytology. Conclusions Overall prediction accuracies of 79% and 89% were achieved for the benign vs PTC model and benign vs FTC model, respectively, for thyroid FNA classification. With the addition of DESI-MS, unnecessary diagnostic surgeries could be prevented by providing improved preoperative specificity.