Exploring Machine Learning in Chemistry through the Classification of Spectra: An Undergraduate Project

Alanah Grant St James,Claire Vallance,Thomas Mills,Malcolm I Stewart,Patrick E Bergstrom Mann,Luke Hand,Annabel S J Brunt,Andrew F Worrall,Liwen Song

doi:10.1021/acs.jchemed.2c00682

Alanah Grant St James, Claire Vallance + Show 7 more

Open Access

https://doi.org/10.1021/acs.jchemed.2c00682

Copy DOI

Journal: Journal of Chemical Education	Publication Date: Feb 13, 2023
Citations: 8	License type: CC BY 4.0

Affiliation: University of Oxford

Abstract

Applications of machine learning in chemistry are many and varied, from prediction of structure–property relationships, to modeling of potential energy surfaces for large scale atomistic simulations. We describe a generalized approach for the application of machine learning to the classification of spectra which can be used as the basis for a wide variety of undergraduate projects. While our examples use FTIR and mass spectra, the approach could equally well be used with UV–visible, Raman, NMR, or indeed any other type of spectra. We summarize a number of different unsupervised and supervised machine learning algorithms that can be used to classify spectra into groups, and illustrate their application using data from three different projects carried out by fourth year chemistry undergraduates. The three projects investigated the ability of the various machine learning approaches to correctly classify spectra of a variety of fruits, whiskies, and teas, respectively. In all cases the algorithms were able to differentiate between the various samples used in each study, and the trained machine learning models could then be used to classify unknown samples with a high degree of accuracy (>98% in many cases). Depending on the extent to which students are expected to write their own code to perform the data analysis, the general model adopted in this work can be adapted for a variety of purposes, from short (one to two day) practical exercises and workshops, to much longer independent student projects.

Full Text