Optimized synthetic data and semi-supervised learning for Derived Cetane Number prediction

Manaf Sheyyab,Patrick T Lynch,Eric K Mayhew,Kenneth Brezinsky

doi:10.1016/j.combustflame.2023.113184

Abstract

Derived Cetane Number (DCN) serves as a critical indicator for assessing the ignition quality of fuels in diesel engines. Training generalized regression models for DCN is challenging due to limited data availability. In this study, we propose a novel semi-supervised approach that combines real hydrocarbon mixtures and synthetically generated mixtures to overcome this data scarcity obstacle. The synthetic mixtures are generated using a Sequential Least Squares Programming (SLSQP) optimization method, targeting comprehensive coverage of UNIFAC chemical functional group compositions. By utilizing a dataset of real and synthetic mixtures, an Artificial Neural Network (ANN) model is trained based on the UNIFAC chemical functional group composition of fuels to improve DCN prediction accuracy and model reliability compared to models trained solely on real data. The improved model achieves excellent performance on the training and testing datasets, as indicated by high R2 Score, Mean Square Error (MSE), and Mean Absolute Error (MAE) values. Moreover, the model demonstrates accurate predictions on ten real fuels and their eighteen mixtures, with 100 % of samples falling within ±10 % of the measured DCN by an Ignition Quality Tester (IQT). The results demonstrate the potential of augmenting real data using appropriate data generation techniques to improve the representativeness and predictive capabilities of models in the presence of limited experimental data.

Full Text