In recent years, with the rise of various machine learning methods, the Ultraviolet and Near Infrared (UV-NIR) spectral analysis has been impressive in the determination of intricate systems. However, the UV-NIR spectral analysis based on traditional machine learning requires independent training with tedious parameter tuning for different samples or tasks. As a result, training a high-quality model is often complicated and time-consuming. Large language model (LLM) is one of the cutting-edge achievements in deep learning, with the parameter size of the order of billion. LLM can extract abstract information from input and use it effectively. Even without any additional training, using only simple natural language prompts, LLM can accomplish tasks that have never been seen before in completely new domains. We look forward to utilizing this capability in spectral analysis to reduce the time-consuming and operational difficulties. In this study, we used UV-NIR spectral analysis to predict the concentration of Chemical Oxygen Demand (COD) in three different water samples, including a complex wastewater. By extracting the characteristic bands in the spectrum, we input them into LLM for concentration prediction. We compared the COD prediction results of different models on water samples and discussed the effects of different experiments setting on LLM. The results show that even with brief prompts, the prediction of LLM in wastewater achieved the best performance, with R2 and RMSE equal to 0.931 and 10.966, which exceed the best results of traditional models, where R2 and RMSE correspond to 0.920 and 11.854. This result indicates that LLM, with simpler operation and less time-consuming, has ability to approach or even surpass traditional machine learning models in UV-NIR spectral analysis. In conclusion, our study proposed a new method for the UV-NIR spectral analysis based on LLM and preliminary demonstrated the potential of LLM for application.
Read full abstract