Machine learning framework to predict product distribution of lignocellulosic biomass pyrolysis.
Machine learning framework to predict product distribution of lignocellulosic biomass pyrolysis.
1757
- 10.1016/j.rser.2015.12.185
- Jan 8, 2016
- Renewable and Sustainable Energy Reviews
14
- 10.1016/j.jclepro.2023.137472
- May 12, 2023
- Journal of Cleaner Production
11
- 10.1016/j.cherd.2018.04.021
- Apr 25, 2018
- Chemical Engineering Research and Design
14
- 10.1016/j.jaap.2022.105546
- May 12, 2022
- Journal of Analytical and Applied Pyrolysis
344
- 10.1016/j.ces.2007.11.024
- Nov 19, 2007
- Chemical Engineering Science
5
- 10.1021/acs.cgd.3c01027
- Dec 12, 2023
- Crystal Growth & Design
101
- 10.1021/acssuschemeng.6b03098
- Mar 3, 2017
- ACS Sustainable Chemistry & Engineering
197
- 10.1021/acssuschemeng.6b03096
- Mar 2, 2017
- ACS Sustainable Chemistry & Engineering
405
- 10.1002/cjce.5450670111
- Feb 1, 1989
- The Canadian Journal of Chemical Engineering
39
- 10.1016/j.cej.2016.09.135
- Sep 29, 2016
- Chemical Engineering Journal
- Research Article
16
- 10.1109/jbhi.2019.2961808
- Nov 1, 2020
- IEEE Journal of Biomedical and Health Informatics
Parkinson's Disease is a disorder with diagnostic symptoms that include a change to a walking gait. The disease is problematic to diagnose. An objective method of monitoring the gait of a patient is required to ensure the effectiveness of diagnosis and treatments. We examine the suitability of Extreme Gradient Boosting (XGBoost) and Artificial Neural Network (ANN) Models compared to Symbolic Regression (SR) using genetic programming that was demonstrated to be successful in previous works on gait. The XGBoost and ANN models are found to out-perform SR, but the SR model is more human explainable.
- Research Article
9
- 10.3390/en15239008
- Nov 28, 2022
- Energies
The integration of Photovoltaic (PV) systems requires the implementation of potential PV power forecasting techniques to deal with the high intermittency of weather parameters. In the PV power prediction process, Genetic Programming (GP) based on the Symbolic Regression (SR) model has a widespread deployment since it provides an effective solution for nonlinear problems. However, during the training process, SR models might miss optimal solutions due to the large search space for the leaf generations. This paper proposes a novel hybrid model that combines SR and Deep Multi-Layer Perceptron (MLP) for one-month-ahead PV power forecasting. A case study analysis using a real Australian weather dataset was conducted, where the employed input features were the solar irradiation and the historical PV power data. The main contribution of the proposed hybrid SR-MLP algorithm are as follows: (1) The training speed was significantly improved by eliminating unimportant inputs during the feature selection process performed by the Extreme Boosting and Elastic Net techniques; (2) The hyperparameters were preserved throughout the training and testing phases; (3) The proposed hybrid model made use of a reduced number of layers and neurons while guaranteeing a high forecasting accuracy; (4) The number of iterations due to the use of SR was reduced. The presented simulation results demonstrate the higher forecasting accuracy (reductions of more than 20% for Root Mean Square Error (RMSE) and 30 % for Mean Absolute Error (MAE) in addition to an improvement in the R2 evaluation metric) and robustness (preventing the SR from converging to local minima with the help of the ANN branch) of the proposed SR-MLP model as compared to individual SR and MLP models.
- Research Article
15
- 10.1016/j.watres.2021.117965
- Dec 15, 2021
- Water Research
A CFD-ML augmented alternative to residence time for clarification basin scaling and design
- Research Article
5
- 10.1016/j.foodchem.2012.02.053
- Feb 16, 2012
- Food Chemistry
Calibration of artificial neural network and partial least squares regression models for the prediction of secoisolariciresinol diglucoside contents in microwave-assisted extracts of various flaxseed (Linum usitatissimum L.) samples
- Research Article
21
- 10.1186/s12911-021-01667-8
- Nov 2, 2021
- BMC Medical Informatics and Decision Making
BackgroundEarly identification of the occurrence of arrhythmia in patients with acute myocardial infarction plays an essential role in clinical decision-making. The present study attempted to use machine learning (ML) methods to build predictive models of arrhythmia after acute myocardial infarction (AMI).MethodsA total of 2084 patients with acute myocardial infarction were enrolled in this study. (All data is available on Github: https://github.com/wangsuhuai/AMI-database1.git). The primary outcome is whether tachyarrhythmia occurred during admission containing atrial arrhythmia, ventricular arrhythmia, and supraventricular tachycardia. All data is randomly divided into a training set (80%) and an internal testing set (20%). Apply three machine learning algorithms: decision tree, random forest (RF), and artificial neural network (ANN) to learn the training set to build a model, then use the testing set to evaluate the prediction performance, and compare it with the model built by the Global Registry of Acute Coronary Events (GRACE) risk variable set.ResultsThree ML models predict the occurrence of tachyarrhythmias after AMI. After variable selection, the artificial neural network (ANN) model has reached the highest accuracy rate, which is better than the model constructed using the Grace variable set. After applying SHapley Additive exPlanations (SHAP) to make the model interpretable, the most important features are abnormal wall motion, lesion location, bundle branch block, age, and heart rate. Among them, RBBB (odds ratio [OR]: 4.21; 95% confidence interval [CI]: 2.42–7.02), ≥ 2 ventricular walls motion abnormal (OR: 3.26; 95% CI: 2.01–4.36) and right coronary artery occlusion (OR: 3.00; 95% CI: 1.98–4.56) are significant factors related to arrhythmia after AMI.ConclusionsWe used advanced machine learning methods to build prediction models for tachyarrhythmia after AMI for the first time (especially the ANN model that has the best performance). The current study can supplement the current AMI risk score, provide a reliable evaluation method for the clinic, and broaden the new horizons of ML and clinical research.Trial registration Clinical Trial Registry No.: ChiCTR2100041960.
- Research Article
15
- 10.1093/annweh/wxaa097
- Oct 23, 2020
- Annals of Work Exposures and Health
Respirable crystalline silica (RCS) overexposure can lead to the development of silicosis which is a chronic, irreversible, potentially fatal respiratory disease. The most significant prerequisite for any silica exposure control plan is an accurate occupational exposure assessment. The results of crystalline silica analysis are often affected by other mineral interferences and are influenced by an analyst's knowledge of mineralogy to accurately interpret infrared spectra and correct matrix interferences. Partial least squares (PLS) and artificial neural networks (ANNs) are two multivariate calibration methods to overcome the problem of spectral interferences without the need for an analyst intervention. The performance of these two methods in quantitative analysis of quartz in the presence of mineral interferences was evaluated and compared in this study. Fifty mixtures with different crystalline silica content ratios were prepared by mixing quartz with four common mineral interferences including kaolinite, albite, muscovite, and amorphous silica. Fourier-transform infrared spectra of the mixtures were split into training and test datasets. The optimal architecture of the ANN model was achieved using a two-level full factorial design experiment and data were modeled using ANN and PLS regression analysis. Root mean squared error of prediction values of 1.69 and 6.12 µg quartz for ANN and PLS models, respectively, revealed the fact that the both models performed very well in quantitative analysis of quartz in the presence of mineral interferences, with a better relative performance of the ANN model which can be related to the inherent nonlinear predictive ability of ANNs. Given the excellent predictive ability of the ANN model which can deal with a completely overlapped peak without any need of user's intervention, it is recommended that the ANN model be optimized in future studies and utilized for reliable and rapid on-field assessment of RCS exposure.
- Conference Article
4
- 10.1109/cec45853.2021.9504767
- Jun 28, 2021
Data incompleteness is a pervasive problem in symbolic regression, and machine learning in general. Unfortunately, most symbolic regression methods are only applicable when the given data is complete. One common approach to handling this situation is data imputation. It works by estimating missing values based on existing data. However, which existing data should be used for imputing the missing values? The answer to this question is important when dealing with incomplete data. To address this question, this work proposes a mixed tree-vector representation for genetic programming to perform instance selection and symbolic regression on incomplete data. In this representation, each individual has two components: an expression tree and a bit vector. While the tree component constructs symbolic regression models, the vector component selects the instances that are used to impute missing values by the weighted k-nearest neighbour (WKNN) imputation method. The complete imputed instances are then used to evaluate the GP-based symbolic regression model. The obtained experimental results show the applicability of the proposed method on real-world data sets with different missingness scenarios. When compared with existing methods, the proposed method not only produces more effective symbolic regression models but also achieves more efficient imputations.
- Research Article
16
- 10.1186/s10033-023-00876-8
- Mar 27, 2023
- Chinese Journal of Mechanical Engineering
Machine learning (ML) has powerful nonlinear processing and multivariate learning capabilities, so it has been widely utilised in the fatigue field. However, most ML methods are inexplicable black-box models that are difficult to apply in engineering practice. Symbolic regression (SR) is an interpretable machine learning method for determining the optimal fitting equation for datasets. In this study, domain knowledge-guided SR was used to determine a new fatigue crack growth (FCG) rate model. Three terms of the variable subtree of ΔK, R-ratio, and ΔKth were obtained by analysing eight traditional semi-empirical FCG rate models. Based on the FCG rate test data from other literature, the SR model was constructed using Al-7055-T7511. It was subsequently extended to other alloys (Ti-10V-2Fe-3Al, Ti-6Al-4V, Cr-Mo-V, LC9cs, Al-6013-T651, and Al-2324-T3) using multiple linear regression. Compared with the three semi-empirical FCG rate models, the SR model yielded higher prediction accuracy. This result demonstrates the potential of domain knowledge-guided SR for building the FCG rate model.
- Research Article
11
- 10.1038/s41598-024-64386-w
- Jun 25, 2024
- Scientific Reports
This study explores machine learning (ML) capabilities for predicting the shear strength of reinforced concrete deep beams (RCDBs). For this purpose, eight typical machine-learning models, i.e., symbolic regression (SR), XGBoost (XGB), CatBoost (CATB), random forest (RF), LightGBM, support vector regression (SVR), artificial neural networks (ANN), and Gaussian process regression (GPR) models, are selected and compared based on a database of 840 samples with 14 input features. The hyperparameter tuning of the introduced ML models is performed using the Bayesian optimization (BO) technique. The comparison results show that the CatBoost model is the most reliable and accurate ML model (R2 = 0.997 and 0.947 in the training and testing sets, respectively). In addition, simple and practical design expressions for RCDBs have been proposed based on the SR model with a physical meaning and acceptable accuracy (an average prediction-to-test ratio of 0.935 and a standard deviation of 0.198). Meanwhile, the shear strength predicted by ML models was then compared with classical mechanics-driven shear models, including two prominent practice codes (i.e., ACI318, EC2) and two previous mechanical models, which indicated that the ML approach is highly reliable and accurate over conventional methods. In addition, a reliability-based design was conducted on two ML models, and their reliability results were compared with those of two code standards. The findings revealed that the ML models demonstrate higher reliability compared to code standards.
- Research Article
173
- 10.1016/j.cageo.2012.11.015
- Nov 28, 2012
- Computers & Geosciences
Monthly river flow forecasting using artificial neural network and support vector regression models coupled with wavelet transform
- Research Article
90
- 10.3171/2013.1.jns121130
- Feb 1, 2013
- Journal of Neurosurgery
Most reports compare artificial neural network (ANN) models and logistic regression models in only a single data set, and the essential issue of internal validity (reproducibility) of the models has not been adequately addressed. This study proposes to validate the use of the ANN model for predicting in-hospital mortality after traumatic brain injury (TBI) surgery and to compare the predictive accuracy of ANN with that of the logistic regression model. The authors of this study retrospectively analyzed 16,956 patients with TBI nationwide who were surgically treated in Taiwan between 1998 and 2009. For every 1000 pairs of ANN and logistic regression models, the area under the receiver operating characteristic curve (AUC), Hosmer-Lemeshow statistics, and accuracy rate were calculated and compared using paired t-tests. A global sensitivity analysis was also performed to assess the relative importance of input parameters in the ANN model and to rank the variables in order of importance. The ANN model outperformed the logistic regression model in terms of accuracy in 95.15% of cases, in terms of Hosmer-Lemeshow statistics in 43.68% of cases, and in terms of the AUC in 89.14% of cases. The global sensitivity analysis of in-hospital mortality also showed that the most influential (sensitive) parameters in the ANN model were surgeon volume followed by hospital volume, Charlson comorbidity index score, length of stay, sex, and age. This work supports the continued use of ANNs for predictive modeling of neurosurgery outcomes. However, further studies are needed to confirm the clinical efficacy of the proposed model.
- Conference Article
2
- 10.1109/micai.2014.33
- Nov 1, 2014
Symbolic regression is an application of genetic programming and is used for modeling different dynamic processes. Industrial processes problems have been solved using this technique. In this work a symbolic regression algorithm is used for modeling the synthesis process of the oxides Bi2MoO6and V2O5 in order to provide a model. These oxides are used on heterogeneous photo catalysis. Genetic programming, artificial neural network and linear regression are compared with symbolic regression models using statistics metrics to evaluate them.
- Research Article
14
- 10.1038/s41598-024-53352-1
- Feb 5, 2024
- Scientific Reports
Concrete-filled steel tubular (CFST) columns have extensive applications in structural engineering due to their exceptional load-bearing capability and ductility. However, existing design code standards often yield different design capacities for the same column properties, introducing uncertainty for engineering designers. Moreover, conventional regression analysis fails to accurately predict the intricate relationship between column properties and compressive strength. To address these issues, this study proposes the use of two machine learning (ML) models—Gaussian process regression (GPR) and symbolic regression (SR). These models accept a variety of input variables, encompassing geometric and material properties of stub CFST columns, to estimate their strength. An experimental database of 1316 specimens was compiled from various research papers, including circular, rectangular, and double-skin stub CFST columns. In addition, a dimensionless output variable, referred to as the strength index, is introduced to enhance model performance. To validate the efficiency of the introduced models, predictions from these models are compared with those from two established standard codes and various ML algorithms, including support vector regression optimized with particle swarm optimization (PSVR), artificial neural networks, XGBoost (XGB), CatBoost (CATB), Random Forest, and LightGBM models. Through performance metrics, the CATB, GPR, PSVR and XGB models emerge as the most accurate and reliable models from the evaluation results. In addition, simple and practical design equations for the different types of CFST columns have been proposed based on the SR model. The developed ML models and proposed equations can predict the compressive strength of stub CFST columns with reliable and accurate results, making them valuable tools for structural engineering. Furthermore, the Shapley additive interpretation (SHAP) technique is employed for feature analysis. The results of the feature analysis reveal that section slenderness ratio and concrete strength parameters negatively impact the compressive strength index.
- Research Article
- 10.11591/ijai.v14.i1.pp286-297
- Feb 1, 2025
- IAES International Journal of Artificial Intelligence (IJ-AI)
<span lang="EN-US">Precise identification of customer churn is crucial for e-commerce companies due to the high costs associated with acquiring new customers. In this sector, where revenues are affected by customer churn, the challenge is intensified by the diversity of product choices offered on various marketplaces. Customers can easily switch from one platform to another, emphasizing the need for accurate churn classification to anticipate revenue fluctuations in <br /> e-commerce. In this context, this study proposes seven machine learning classification models to predict customer churn, including decision tree (DT), random forest (RF), support vector machine (SVM), logistic regression (LR), naïve Bayes (NB), k-nearest neighbors (K-NN), and artificial neural network (ANN). The performances of the models were evaluated using confusion matrix, accuracy, precision, recall, and F1-score. The results indicated that the ANN model achieves the highest accuracy at 92.09%, closely followed by RF at 91.21%. In contrast, the NB model performed the least favorably with an accuracy of 75.04%. Two explainable artificial intelligence (XAI) methods, shapley additive explanations (SHAP) and local interpretable model-agnostic explanations (LIME), were used to explain the models. SHAP provided global explanations for both ANN and RF models through Kernel SHAP and Tree SHAP. LIME, offering local explanations, was applied only to the ANN model which gave better accuracy.</span>
- Research Article
13
- 10.1016/j.jaap.2024.106486
- Mar 30, 2024
- Journal of Analytical and Applied Pyrolysis
Machine learning prediction of bio-oil production from the pyrolysis of lignocellulosic biomass: Recent advances and future perspectives
- New
- Research Article
- 10.1016/j.biortech.2025.133579
- Nov 6, 2025
- Bioresource technology
- New
- Research Article
- 10.1016/j.biortech.2025.133625
- Nov 6, 2025
- Bioresource technology
- Research Article
- 10.1016/j.biortech.2025.132878
- Nov 1, 2025
- Bioresource technology
- Research Article
- 10.1016/j.biortech.2025.132867
- Nov 1, 2025
- Bioresource technology
- Research Article
- 10.1016/j.biortech.2025.132993
- Nov 1, 2025
- Bioresource technology
- Research Article
- 10.1016/j.biortech.2025.133601
- Nov 1, 2025
- Bioresource technology
- Research Article
- 10.1016/s0960-8524(25)01061-2
- Nov 1, 2025
- Bioresource Technology
- Research Article
- 10.1016/j.biortech.2025.133050
- Nov 1, 2025
- Bioresource technology
- Research Article
- 10.1016/j.biortech.2025.132922
- Nov 1, 2025
- Bioresource technology
- Research Article
- 10.1016/j.biortech.2025.133021
- Nov 1, 2025
- Bioresource technology
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.