Unveiling key peak features for olive oil authentication utilizing Raman spectroscopy and chemometrics.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Adulteration of olive oil significantly compromises the interests of both producers and consumers, making its authentication a crucial challenge in the food industry. This study explored the potential of combining Raman spectroscopy with machine learning for discriminating various blended samples and quantifying olive oil content in mixtures. Raman features, such as peak intensities at specific shifts, were extracted from the spectra and analyzed using hierarchical cluster analysis (HCA) and correlation analysis (CA) to identify significant variations corresponding to altered proportions of olive oil. Qualitative and quantitative analyses were performed to classify 10 oil types and predict compositional ratios in binary and ternary blends, comparing different chemometric techniques and input features. Among these, the random forest (RF) model yielded a high classification accuracy (98.9%) and strong predictive performance, with coefficients of determination (R2) of 0.985 and 0.926 on the binary and ternary samples, respectively. The Shapley additive explanations (SHAP) algorithm was subsequently employed to assess the contribution of key Raman features to the prediction accuracy of superior models. Overall, this novel analytical framework highlights Raman features and offers a promising solution for real-time quality monitoring of olive oil products.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.rinp.2024.107978
Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites
  • Sep 18, 2024
  • Results in Physics
  • S.B Akinpelu + 7 more

Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites

  • Research Article
  • Cite Count Icon 87
  • 10.1016/j.aca.2020.07.029
Application of High Resolution Mass Spectrometric methods coupled with chemometric techniques in olive oil authenticity studies - A review
  • Jul 30, 2020
  • Analytica Chimica Acta
  • Natasa P Kalogiouri + 3 more

Application of High Resolution Mass Spectrometric methods coupled with chemometric techniques in olive oil authenticity studies - A review

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.crfs.2024.100913
Integrating near-infrared hyperspectral imaging with machine learning and feature selection: Detecting adulteration of extra-virgin olive oil with lower-grade olive oils and hazelnut oil
  • Jan 1, 2024
  • Current Research in Food Science
  • Derick Malavi + 2 more

Integrating near-infrared hyperspectral imaging with machine learning and feature selection: Detecting adulteration of extra-virgin olive oil with lower-grade olive oils and hazelnut oil

  • Research Article
  • Cite Count Icon 2
  • 10.1021/acs.jcim.5c02015
Improving Machine Learning Classification Predictions through SHAP and Features Analysis Interpretation.
  • Oct 20, 2025
  • Journal of chemical information and modeling
  • Leonardo Bernal + 2 more

Tree-based machine learning (ML) algorithms, such as Extra Trees (ET), Random Forest (RF), Gradient Boosting Machine (GBM), and XGBoost (XGB) are among the most widely used in early drug discovery, given their versatility and performance. However, models based on these algorithms often suffer from misclassification and reduced interpretability issues, which limit their applicability in practice. To address these challenges, several approaches have been proposed, including the use of SHapley Additive Explanations (SHAP). While SHAP values are commonly used to elucidate the importance of features driving models' predictions, they can also be employed in strategies to improve their prediction performance. Building on these premises, we propose a novel approach that integrates SHAP and features value analyses to reduce misclassification in model predictions. Specifically, we benchmarked classifiers based on ET, RF, GBM, and XGB algorithms using data sets of compounds with known antiproliferative activity against three prostate cancer (PC) cell lines (i.e., PC3, LNCaP, and DU-145). The best-performing models, based on RDKit and ECFP4 descriptors with GBM and XGB algorithms, achieved MCC values above 0.58 and F1-score above 0.8 across all data sets, demonstrating satisfactory accuracy and precision. Analyses of SHAP values revealed that many misclassified compounds possess feature values that fall within the range typically associated with the opposite class. Based on these findings, we developed a misclassification-detection framework using four filtering rules, which we termed "RAW", SHAP, "RAW OR SHAP", and "RAW AND SHAP". These filtering rules successfully identified several potentially misclassified predictions, with the "RAW OR SHAP" rule retrieving up to 21%, 23%, and 63% of misclassified compounds in the PC3, DU-145, and LNCaP test sets, respectively. The developed flagging rules enable the systematic exclusion of likely misclassified compounds, even across progressively higher prediction confidence levels, thus providing a valuable approach to improve classifier performance in virtual screening applications.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 32
  • 10.3390/polym14183906
Application of Ensemble Machine Learning Methods to Estimate the Compressive Strength of Fiber-Reinforced Nano-Silica Modified Concrete
  • Sep 19, 2022
  • Polymers
  • Madiha Anjum + 5 more

In this study, compressive strength (CS) of fiber-reinforced nano-silica concrete (FRNSC) was anticipated using ensemble machine learning (ML) approaches. Four types of ensemble ML methods were employed, including gradient boosting, random forest, bagging regressor, and AdaBoost regressor, to achieve the study’s aims. The validity of employed models was tested and compared using the statistical tests, coefficient of determination (R2), and k-fold method. Moreover, a Shapley Additive Explanations (SHAP) analysis was used to observe the interaction and effect of input parameters on the CS of FRNSC. Six input features, including fiber volume, coarse aggregate to fine aggregate ratio, water to binder ratio, nano-silica, superplasticizer to binder ratio, and specimen age, were used for modeling. In predicting the CS of FRNSC, it was observed that gradient boosting was the model of lower accuracy and the AdaBoost regressor had the highest precision in forecasting the CS of FRNSC. However, the performance of random forest and the bagging regressor was also comparable to that of the AdaBoost regressor model. The R2 for the gradient boosting, random forest, bagging regressor, and AdaBoost regressor models were 0.82, 0.91, 0.91, and 0.92, respectively. Also, the error values of the models further validated the exactness of the ML methods. The average error values for the gradient boosting, random forest, bagging regressor, and AdaBoost regressor models were 5.92, 4.38, 4.24, and 3.73 MPa, respectively. SHAP study discovered that the coarse aggregate to fine aggregate ratio shows a greater negative correlation with FRNSC’s CS. However, specimen age affects FRNSC CS positively. Nano-silica, fiber volume, and the ratio of superplasticizer to binder have both positive and deleterious effects on the CS of FRNSC. Employing these methods will promote the building sector by presenting fast and economical methods for calculating material properties and the impact of raw ingredients.

  • Research Article
  • Cite Count Icon 4
  • 10.29050/harranziraat.478010
Using chromatographic methods in detection of olive oil adulteration
  • Sep 19, 2019
  • Harran Tarım ve Gıda Bilimleri Dergisi
  • Songül Kesen

In research study, olive oil adulteration with olive pomace oil was monitored by fatty acids, ΔECN42 values and sterol analysis. To this end, virgin olive oil obtained from cv. Kilis Yaglik (KY) was mixed with olive pomace oil at different proportion (1, 5 and 10 %). Gas Chromatography (GC) was used to analyse fatty acid and sterol compositions. The fatty acids with Equivalent Carbon Number 42 (ECN42) and ΔECN42 values of pure and adulterated oils were also used to determine adulteration. Considering the results of fatty acids analysis, when olive pomace oil was mixed, the ratios of oleic acid and palmitic acid in olive oil, was decreased. The difference of theoretical and experimental ECN42 values (ΔECN42) were increased in adulterated oils. Beta-sitosterol which is important compound in the sterol composition, increased up to 81.42 % when mixed with 10 % olive pomace oil. Taking into account the Rmar values of the oil samples, adulterated oils displayed higher value than of pure oil. According to PCA analyses, oil samples took placed in three different groups according to fatty acids and TAGs profile, while in four different groups due to sterol composition. In all of the PCA analyzes, pure KY oil was clearly separated from the adulterated oils.

  • Research Article
  • 10.3390/rs18010040
Refined Leaf Area Index Retrieval in Yellow River Delta Coastal Wetlands: UAV-Borne Hyperspectral and LiDAR Data Fusion and SHAP–Correlation-Integrated Machine Learning
  • Dec 23, 2025
  • Remote Sensing
  • Chenqiang Shan + 9 more

The leaf area index (LAI) serves as a critical parameter for assessing wetland ecosystem functions, and accurate LAI retrieval holds substantial significance for wetland conservation and ecological monitoring. To address the spatial constraints of traditional ground-based measurements and the limited accuracy of single-source remote sensing data, this study utilized unmanned aerial vehicle (UAV)-borne hyperspectral and LiDAR sensors to acquire high-quality multi-source remote sensing data of coastal wetlands in the Yellow River Delta. Three machine learning algorithms—random forest (RF), Extreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost)—were employed for LAI retrieval modeling. A total of 38 vegetation indices (VIs) and 12-point cloud features (PCFs) were extracted from hyperspectral imagery and LiDAR point cloud data, respectively. Pearson correlation analysis and the Shapley Additive Explanations (SHAP) method were integrated to identify and select the most informative VIs and PCFs. The performance of LAI retrieval models built on single-source features (VIs or PCFs) or multi-source feature fusion was evaluated using the coefficient of determination (R2) and root mean square error (RMSE). The main findings are as follows: (1) Multi-source feature fusion significantly improved LAI retrieval accuracy, with the RF model achieving the highest performance (R2 = 0.968, RMSE = 0.125). (2) LiDAR-derived structural metrics and hyperspectral-derived vegetation indices were identified as critical factors for accurate LAI retrieval. (3) The feature selection method integrating mean absolute SHAP values (|SHAP| values) with Pearson correlation analysis enhanced model robustness. (4) The intertidal zone exhibited pronounced spatial heterogeneity in the vegetation LAI distribution.

  • Preprint Article
  • 10.21203/rs.3.rs-5946945/v1
Hydro-environmental predictive management of sub-surface salinization in arid nearshore-coastal saline aquifer using deep learning and SHAP analysis
  • Mar 14, 2025
  • Fahad Jibrin Abdu + 6 more

Groundwater (GW) management is vital in arid regions like Saudi Arabia, where agriculture heavily depends on this resource. Traditional GW monitoring and prediction methods often fall short of capturing the complex interactions and temporal dynamics of GW systems. This study introduces an innovative approach that integrates deep learning (DL) techniques with Shapley Additive Explanations (SHAP) to enhance GW predictive management in Saudi Arabia’s agricultural regions. SHAP analysis is used to interpret each feature’s influence on the model’s predictions, thereby improving the transparency and understanding of the models’ decision-making processes. Six different data-driven models, including Hammerstein-Wiener (HW), Random Forest (RF), Artificial Neural Networks (ANNs), eXtreme Gradient Boosting (XGBoost), Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM), were utilized to predict GW salinity based on electrical conductivity (EC). The calibration results suggest that the RF model exhibits the highest Determination Coefficient (DC) of 0.9903 and Nash-Sutcliffe Efficiency (NSE) of 0.9899, indicating its superior predictive accuracy, followed closely by the LSTM model with a DC of 0.9835 and NSE of 0.9827. During the validation phase, the LSTM model demonstrated superior performance with the lowest Mean Absolute Error (MAE) of 13.9547 and Mean Absolute Percentage Error (MAPE) of 0.2813, indicating minimal deviation between predicted and observed EC values. The SHAP analysis revealed that chloride (Cl), with a mean SHAP value of ~ 1250, has the highest impact on EC, suggesting that variations in chloride concentration significantly influence GW salinity. Magnesium (Mg) follows closely with a mean SHAP value of ~ 1200, highlighting its role in water hardness and EC. Sodium (Na), with a mean SHAP value of ~ 600, has a moderate impact, contributing to overall salinity from natural processes and human activities. The proposed method has proven effective, with the LSTM algorithm offering an excellent and reliable tool for predicting EC. This advancement will result in more efficient planning and decision-making related to water resources.

  • Research Article
  • Cite Count Icon 84
  • 10.1016/j.talanta.2017.09.095
Detection and quantification of extra virgin olive oil adulteration by means of autofluorescence excitation-emission profiles combined with multi-way classification
  • Oct 12, 2017
  • Talanta
  • Isabel Durán Merás + 3 more

Detection and quantification of extra virgin olive oil adulteration by means of autofluorescence excitation-emission profiles combined with multi-way classification

  • Research Article
  • 10.3390/en19010124
A Day-Ahead Wind Power Dynamic Explainable Prediction Method Based on SHAP Analysis and Mixture of Experts
  • Dec 25, 2025
  • Energies
  • Hao Zhang + 5 more

Traditional single-prediction models often exhibit limitations in meeting wind power prediction requirements in complex operational scenarios. Furthermore, the inherent “black-box” nature of deep learning models leads to limited interpretability of predictions, hindering effective support for grid dispatch planning. To address these issues, this study proposes a novel day-ahead wind power prediction method, referred to as SHapley Additive exPlanations (SHAP)–Mixture of Experts (MoE), which integrates SHAP into an MoE framework. Here, SHAP is employed for interpretability purposes. This study innovatively transforms SHAP analysis into prior knowledge to guide the decision-making of the MoE gating network and proposes a two-layer dynamic interpretation mechanism based on the collaborative analysis of gating weights and SHAP values. This approach clarifies key meteorological factors and the model’s advantageous scenarios, while quantifying the uncertainty among multiple expert decisions. Firstly, each expert model was pre-trained, and its parameters were frozen to construct a candidate expert pool. Secondly, the SHAP vectors for each pre-trained expert were computed over all sample features to characterize their decision-making logic under varying scenarios. Thirdly, an augmented feature set was constructed by fusing the original meteorological features with SHAP attribution matrices from all experts; this set was used to train the gating network within the MoE framework. Finally, for new input samples, each frozen expert model generates a prediction along with its corresponding SHAP vector, and the gating network aggregates these predictions to produce the final forecast. The proposed method was validated using operational data from an offshore wind farm located in southeastern China. Compared with the best individual expert model and traditional ensemble forecasting models, the proposed method reduces the Root Mean Square Error (RMSE) by 0.23% to 4.92%. Furthermore, the method elucidates the influence of key features on each expert’s decisions, offering insights into how the gating network adaptively selects experts based on the input features and expert-specific characteristics across different scenarios.

  • Research Article
  • 10.3389/fnut.2025.1660430
Dietary patterns and chronic prostatitis: a symptom severity prediction model based on nutritional clustering and machine learning
  • Jan 19, 2026
  • Frontiers in Nutrition
  • Zhen Wang + 5 more

BackgroundChronic prostatitis/chronic pelvic pain syndrome (CP/CPPS) has a multifactorial etiology where diet is considered an important factor. This study aimed to develop a predictive model for CP/CPPS symptom severity by analyzing food frequency questionnaire (FFQ) data with machine learning techniques, providing a basis for personalized nutritional interventions.MethodsThis study included 313 patients with CP/CPPS. We used principal component analysis (PCA) to extract dietary patterns from FFQ data and applied LASSO regression to select key predictors of symptom severity. Subsequently, six machine learning models (logistic regression, random forest, XGBoost, support vector machine, K-nearest neighbors, and multilayer perceptron) were trained and compared. Model performance was evaluated using ROC curves, decision curve analysis (DCA), and calibration plots. SHapley Additive exPlanations (SHAP) were used to interpret the optimal model.ResultsPCA identified two major dietary patterns: a “Red Meat and Processed Food” dietary pattern (PC1) and a “Dairy-rich” pattern (PC2). LASSO regression selected key predictors, among which the “Red Meat and Processed Food” dietary pattern demonstrated the strongest positive association with CP/CPPS symptom severity. Among the models, while support vector machine (SVM) and logistic regression showed high AUC values, the XGBoost model demonstrated the best overall performance across a balance of metrics including accuracy, precision, recall, and F1-score, and was selected as the final model (AUC = 0.883). SHAP analysis identified the Red Meat and Processed Food dietary pattern as the most important feature associated with symptom severity.ConclusionThis study successfully developed a machine learning model based on dietary patterns that effectively predicts CP/CPPS symptom severity. The model underscores the significant association between nutrition and disease management and, with its strong predictive performance and interpretability, offers a novel tool for precision nutrition in CP/CPPS.

  • Research Article
  • 10.47772/ijriss.2025.910000086
Enhancing Employee Productivity and Satisfaction in Malaysian SMEs Using Explainable AI-Based Predictive Modeling
  • Nov 5, 2025
  • International Journal of Research and Innovation in Social Science
  • Nur Diana Izzani Masdzarif + 3 more

This study investigates the application of Explainable Artificial Intelligence (XAI) in predicting employee productivity and job satisfaction in Malaysian small and medium enterprises (SMEs). A predictive modeling framework using Random Forest and SHAP (SHapley Additive exPlanations) is designed to forecast employee outcomes and identify the key drivers influencing workplace productivity and satisfaction. Data from 150 employees across 10 SMEs was collected through surveys, focusing on variables such as autonomy, workload, managerial feedback, and digital tool usage. Results indicate strong predictive performance, with XAI explanations highlighting autonomy and workload as the most influential factors. By integrating XAI into HR analytics, managers can make transparent, data-driven decisions that enhance employee trust, adoption, and engagement. This study contributes to HR management and AI literature by demonstrating a novel framework for explainable workforce analytics tailored to SMEs.

  • Research Article
  • 10.1038/s41598-025-17588-9
Enhancing wellbore stability through machine learning for sustainable hydrocarbon exploitation
  • Oct 9, 2025
  • Scientific Reports
  • Mohatsim Mahetaji + 1 more

Wellbore instability manifested through formation breakouts and drilling-induced fractures poses serious technical and economic risks in drilling operations. It can lead to non-productive time, stuck pipe incidents, wellbore collapse, and increased mud costs, ultimately compromising operational safety and project profitability. Accurately predicting such instabilities is therefore critical for optimizing drilling strategies and minimizing costly interventions. This study explores the application of machine learning (ML) regression models to predict wellbore instability more accurately, using open-source well data from the Netherlands well Q10-06. The dataset spans a depth range of 2177.80 to 2350.92 m, comprising 1137 data points at 0.1524 m intervals, and integrates composite well logs, real-time drilling parameters, and wellbore trajectory information. Borehole enlargement, defined as the difference between Caliper (CAL) and Bit Size (BS), was used as the target output to represent instability. Twelve regression models were evaluated, including Linear and Polynomial Regression, Decision Tree, Random Forest, Gradient Boosting, Histogram Gradient Boosting, Support Vector Regression, Multi-layer Perceptron, k-Nearest Neighbors, Gaussian and Bernoulli Naive Bayes, and Gaussian Process Regression. Model performance was assessed using the Root Mean Squared Error (RMSE) and Coefficient of Determination (DC). Among them, Histogram Gradient Boosting yielded the highest prediction accuracy (RMSE = 8.5138 ×10-2 in, DC = 0.99), followed closely by Gradient Boosting, Random Forest, and Decision Tree models. Conversely, Bernoulli Naive Bayes and Support Vector Regression demonstrated poor generalization. To interpret model predictions, SHAP (SHapley Additive exPlanations) analysis was employed, highlighting the most influential features and their directional impacts. The SHAP results aligned closely with heatmap-based feature correlations, confirming that high-performing models considered a diverse set of features, while underperforming models were overly reliant on limited inputs. This study demonstrates that bypassing traditional empirical correlations in data-driven machine learning techniques can enhance prediction accuracy while preserving model interpretability through SHAP analysis.Supplementary InformationThe online version contains supplementary material available at 10.1038/s41598-025-17588-9.

  • Research Article
  • Cite Count Icon 1
  • 10.2118/224438-pa
Improved Reservoir Porosity Estimation Using an Enhanced Group Method of Data Handling with Differential Evolution Model and Explainable Artificial Intelligence
  • Feb 4, 2025
  • SPE Journal
  • Christopher N Mkono + 7 more

Summary Reservoir characterization is critical to the oil and gas industry, influencing field development, production optimization, hydraulic fracturing, and reserves estimation decisions. Accurately estimating porosity is crucial for reservoir characterization, well planning, and production optimization in the oil and gas industry. Traditional porosity determination methods, such as porosimetry, geostatistical, and core analysis, often involve complex geological and geophysical models, which are expensive and time-consuming. This study used the integrated machine learning and optimization model of differential evolution (DE) with group method of data handling (GMDH-DE) to estimate the porosity using integrated well log and core data from the Mpyo oil field, Uganda. The GMDH-DE demonstrates superior performance compared with conventional GMDH, support vector regression (SVR), and random forest (RF), achieving a coefficient of determination (R2) of 0.9925 and a root mean square error (RMSE) of 0.0017 during training, an R² of 0.9845 with an RMSE of 0.0121 during testing, and when validated the R2 was 0.9825 with RMSE of 0.00018. A key novelty of this work is the integration of Shapley additive explanations (SHAP), which provides an interpretable analysis of the model’s input features. SHAP reveals that bulk density (RHOB) and neutron porosity (NPHI) are the most critical parameters for porosity estimation, offering valuable insight into features importance. The proposed GMDH-DE model and SHAP analysis represent a novel and independent approach for accurate porosity estimation and interpretability, significantly enhancing the efficiency and reliability of hydrocarbon exploration and development.

  • Research Article
  • Cite Count Icon 5
  • 10.1177/0095244309104461
Morphology Development and Melt Linear Viscoelastic Properties of (PA6/PP/PS) Ternary Blend Systems
  • Jun 26, 2009
  • Journal of Elastomers & Plastics
  • Hadi Mohammadigoushki + 2 more

The morphology development and melt linear viscoelastic properties of PA6/PP/PS (70/15/15) ternary blends were studied. An attempt was also made to predict the morphology development of these blends using the dynamic interfacial tension of the blend components evaluated from the Palierne's viscoelastic model in conjunction with spreading coefficient approach. The blend samples were prepared by melt blending in an internal mixer at temperature of 260°C and rotor speed of 60 rpm. The ternary blend samples exhibited a pronounced low-frequency nonterminal storage modulus whose values were much greater than those predicted for elastic response of the binary blend samples. This was attributed to strong elastic resistance of a core-shell composite droplet formed in the ternary blend samples that was evidenced by the SEM micrographs of these samples. The results predicted based on spreading coefficient concept also suggested a core-shell type morphology in which PP core was encapsulated by PS shell as a composite minor phase dispersed in PA6 matrix. It was demonstrated that there is a close relationship between melt viscoelastic properties and morphology of ternary blends.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.