Ultraviolet (UV) absorption spectroscopy is a widely used tool for quantitative and qualitative analyses of chemical compounds. In the gas phase, vacuum UV (VUV) and UV absorption spectra are specific and diagnostic for many small molecules. An accurate prediction of VUV/UV absorption spectra can aid the characterization of new or unknown molecules in areas such as fuels, forensics, and pharmaceutical research. An alternative to quantum chemical spectral prediction is the use of artificial intelligence. Here, different molecular feature representation techniques were used and developed to encode chemical structures for testing three machine learning models to predict gas-phase VUV/UV absorption spectra. Structure data files (.sdf) and VUV/UV absorption spectra for 1397 volatile and semivolatile chemical compounds were used to train and test the models. New molecular features (termed ABOCH) were introduced to better capture pi-bonding, aromaticity, and halogenation. The incorporation of these new features benefited spectral prediction and demonstrated superior performance compared to computationally intensive molecular-based deep learning methods. Of the machine learning methods, the use of a Random Forest regressor returned the best accuracy score with the shortest training time. The developed machine learning prediction model also outperformed spectral predictions based on the time-dependent density functional theory.
Read full abstract