Rosmarinic acid (RA) is a phenolic antioxidant naturally occurring in the plants of the Lamiaceae family, including basil (Ocimum basilicum L.). Existing analytical methods for determining the RA content in leaves are time-consuming and destructive, posing limitations on quality assessment and control during cultivation. In this study, we aimed to develop non-destructive prediction models for the RA content in basil plants using a portable hyperspectral imaging (HSI) system and machine learning algorithms. The basil plants were grown in a vertical farm module with controlled environments, and the HSI of the whole plant was captured using a portable HSI camera in the range of 400–850 nm. The average spectra were extracted from the segmented regions of the plants. We employed several spectral data pre-processing methods and ensemble learning algorithms, such as Random Forest, AdaBoost, XGBoost, and LightGBM, to develop the RA prediction model and feature selection based on feature importance. The best RA prediction model was the LightGBM model with feature selection by the AdaBoost algorithm and spectral pre-processing through logarithmic transformation and second derivative. This model performed satisfactorily for practical screening with R2P = 0.81 and RMSEP = 3.92. From in-field HSI data, the developed model successfully estimated and visualized the RA distribution in basil plants growing in the greenhouse. Our findings demonstrate the potential use of a portable HSI system for monitoring and controlling pharmaceutical quality in medicinal plants during cultivation. This non-destructive and rapid method can provide a valuable tool for assessing the quality of RA in basil plants, thereby enhancing the efficiency and accuracy of quality control during the cultivation stage.
Read full abstract