The flotation grade prediction model based on mechanism-guided and data-driven approaches

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

The flotation grade prediction model based on mechanism-guided and data-driven approaches

Similar Papers
  • Research Article
  • Cite Count Icon 43
  • 10.1016/j.eja.2022.126569
Mixing process-based and data-driven approaches in yield prediction
  • Jul 8, 2022
  • European Journal of Agronomy
  • Bernardo Maestrini + 6 more

Yield prediction models can be divided between data-driven and process-based models (crop growth models). The first category contains many different types of models with parameters learned from the data themselves and where domain knowledge is only used to select the predictors and engineer features. In the second category, models are based upon biophysical principles, whose structure and parameters are derived primarily from domain knowledge. Here we investigate if the integration of the two approaches can be beneficial as it allows to overcome the limitations of the two approaches taken individually - lack of sufficiently large, reliable and orthogonal datasets for data-driven approaches and the need of many inputs for process-based models. The applications of the two categories of models have been reviewed, paying special attention to the cases where the two approaches have been mixed. By analysing the literature we identified three major cases of integration between the two approaches: (1) using crop growth models to engineer features and expand the predictors space, (2) use data-driven approaches to estimate missing inputs for process-based models (3) using data-driven approaches to produce meta-models to reduce computation burden. Finally we propose a methodology based on metamodels and transfer learning to integrate data-driven and process-based approaches.

  • Research Article
  • Cite Count Icon 1
  • 10.1175/mwr-d-24-0005.1
Physics-Based vs Data-Driven 24-Hour Probabilistic Forecasts of Precipitation for Northern Tropical Africa
  • Sep 1, 2024
  • Monthly Weather Review
  • Eva-Maria Walz + 4 more

Numerical weather prediction (NWP) models struggle to skillfully predict tropical precipitation occurrence and amount, calling for alternative approaches. For instance, it has been shown that fairly simple, purely data-driven logistic regression models for 24-h precipitation occurrence outperform both climatological and NWP forecasts for the West African summer monsoon. More complex neural network–based approaches, however, remain underdeveloped due to the non-Gaussian character of precipitation. In this study, we develop, apply, and evaluate a novel two-stage approach, where we train a U-Net convolutional neural network (CNN) model on gridded rainfall data to obtain a deterministic forecast and then apply the recently developed, nonparametric Easy Uncertainty Quantification (EasyUQ) approach to convert it into a probabilistic forecast. We evaluate CNN+EasyUQ for 1-day-ahead 24-h accumulated precipitation forecasts over northern tropical Africa for 2011–19, with the Integrated Multi-satellitE Retrievals for GPM (IMERG) data serving as ground truth. In the most comprehensive assessment to date, we compare CNN+EasyUQ to state-of-the-art physics-based and data-driven approaches such as monthly probabilistic climatology, raw and postprocessed ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF), and traditional statistical approaches that use up to 25 predictor variables from IMERG and the ERA5 reanalysis. Generally, statistical approaches perform about on par with postprocessed ECMWF ensemble forecasts. The CNN+EasyUQ approach, however, clearly outperforms all competitors in terms of both occurrence and amount. Hybrid methods that merge CNN+EasyUQ and physics-based forecasts show slight further improvement. Thus, the CNN+EasyUQ approach can likely improve operational probabilistic forecasts of rainfall in the tropics and potentially even beyond. Significance Statement Precipitation forecasts in the tropics remain a great challenge despite their enormous potential to create socioeconomic benefits in sectors such as food and energy production. Here, we develop a purely data-driven, machine learning–based prediction model that outperforms traditional, physics-based approaches to 1-day-ahead forecasts of rainfall occurrence and rainfall amount over northern tropical Africa in terms of both forecast skill and computational costs. A combined data-driven and physics-based (hybrid) approach yields further (slight) improvement in terms of forecast skill. These results suggest new avenues to more accurate and more resource-efficient operational precipitation forecasts in the Global South.

  • Conference Article
  • Cite Count Icon 1
  • 10.56952/arma-2024-0315
Hybrid Strategies for Interpretability of Rate of Penetration Prediction: Automated Machine Learning and SHAP Interpretation
  • Jun 23, 2024
  • Hao Hu + 6 more

ABSTRACT: Accurate prediction of rate of penetration (ROP) during petroleum drilling is crucial to optimize and guide field operations. However, due to the complex nonlinear relationship between drilling parameters and ROP, traditional empirical models often struggle to accurately predict ROP. This study introduces an automated machine learning (AutoML) for ROP prediction and utilizes SHAP (SHapley Additive exPlanations) to interpret the prediction results. The workflow framework based on this collaborative prediction strategy enables automated processing of data and automatic stacking ensemble of multiple machine learning models. It adaptively selects the optimal model after comprehensive validation without human intervention, thereby significantly reducing the time spent on model selection and hyperparameter optimization for ROP prediction. The results indicate that the weighted ensemble model, which has been stacked level-3 and 5-fold cross-validation, achieves the best prediction accuracy: RMSE= 1.86, MSE= 3.47. SHAP provides a global explanation for the model's prediction results, making the results of the automated prediction workflow more convincing and interpretable. This study provides automated machine learning workflow ideas for accurate prediction of ROP so that researchers can focus more on the business scenario itself without excessive machine knowledge and frequent manual intervention. 1. INTRODUCTION In the field of petroleum drilling, the rate of penetration (ROP) is a crucial indicator that reflects the speed at which the drill bit penetrates and breaks through the rock formation. It plays a pivotal role in measuring drilling efficiency. Accurate prediction of ROP is essential for optimizing drilling parameters during the drilling process, which can effectively improve efficiency and reduce costs (Li et al., 2022; H. Zhang et al., 2021; Kuang et al., 2021). With the development of oil drilling technology and modern data science and technology, the prediction methods for ROP have undergone distinct stages of development: empirical or physical models, prediction models that combine physical and data-driven approaches, and machine learning models (Boukredera et al., 2023; Ahmed et al., 2019). In the realm of equation-based prediction using conventional methods, several physics-based models such as the B-Y ROP equation (Bourgoyne & Young, 1974), MSE equation(Caicedo et al., 2005), and Motahhari equation(Motahhari et al., 2010) have certain limitations. These models may not consider all the factors comprehensively, making it challenging to adapt to complex downhole scenarios. Hegde et al.(2017) compared three physics-based traditional models with data-driven models using a combination of physics and data-driven modeling approaches. In terms of using machine learning to predict ROP, various machine learning algorithms such as random forest(RF), support vector machine(SVM), and neural network(NN) have been employed(Moran et al., 2010; Ashrafi et al., 2019; Brenjkar & Biniaz Delijani, 2022; Tunkiel et al., 2022; C. Zhang et al., 2023; Wan et al., 2023). The results indicate that machine learning outperforms traditional models (Soares & Gray, 2019); Bizhani and Kuru(2022) explored the application of Bayesian neural networks in ROP prediction, focusing on the concept of model prediction uncertainty. Duru(2022) optimized five machine learning algorithms: linear regression, decision tree, support vector machine, random forest and multilayer perceptron by genetic algorithm and the results showed enhanced prediction performance of models. Gan et al.(2023) proposed a novel ROP modeling approach called hybrid bat algorithm optimized - restricted Boltzmann machine - back propagation neural network. Qu et al. (2023) improved the backpropagation neural network (BP) and utilized it for ROP prediction methods.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.jclepro.2024.144332
A data-driven hybrid approach towards developing a circular economy diffusion model for the building construction industry
  • Nov 26, 2024
  • Journal of Cleaner Production
  • Benjamin I Oluleye + 2 more

A data-driven hybrid approach towards developing a circular economy diffusion model for the building construction industry

  • Conference Article
  • 10.2118/218116-ms
Prediction of Mineralogical Composition in Heterogeneous Unconventional Reservoirs: Comparisons Between Data-Driven and Chemistry-Based Models
  • Mar 12, 2024
  • Christopher R Clarkson + 3 more

Prediction of mineralogical compositions along multi-fractured horizontal wells (MFHWs) using indirect methods, for the purpose of characterizing lithological and rock brittleness heterogeneity, is appealing due to the challenges associated with direct mineralogical evaluation. This study aims to 1) develop predictive machine learning models for indirect estimation of mineralogical compositions from elemental compositions, 2) compare mineralogical compositions obtained from data-driven and chemistry-based approaches, and 3) provide practical recommendations for fine-tuning and training of data-driven models. Leveraging recent advances in deep learning, an attention-based gated recurrent unit (AttnGRU) with a "feature extractor-post processor" architecture was developed for predicting compositions of ten primary minerals based on elemental data. For comparison, classic regression-based and ensemble learning models including support vector regression (SVR), random forest (RF), and a feedforward neuron network (FFNN) were utilized. Data-driven models were trained and tested using XRD data measured on 217 samples from the Montney Formation, and the outcomes were compared to those derived from stoichiometric material balance equations (a previously-developed chemistry-based model) to evaluate the effectiveness and capabilities of different predictive approaches. The data-driven models consistently outperformed the chemistry-based method with significantly lower mean absolute error (MAE) and higher R2. The predictive performance order was FFNN ≥ AttnGRU > RF > SVR >> chemistry-based model, with MAE = 1.05, 1.09, 1.24, 1.35, and 2.46 wt.%, respectively. Importantly, FFNN, AttnGRU and RF offered more accurate predictions of chlorite and illite, which are known to negatively affect reservoir quality. This indicates the superior performance of the three models for reservoir characterization applications. Furthermore, AttnGRU exhibited greater robustness than the other two models, with less sensitivity to overfitting issues. Data-driven models displayed different levels of performance when decreasing training dataset size. It is recommended that, in order to achieve reasonable predictions for the studied reservoir with data-driven approaches, more than 50 training samples be used. It is further observed that data-driven models exhibited limited predictive capability (MAEs ranging from 3.02-3.45 wt.%) when applied to a synthetic "global dataset" comprised of samples from various formations. Through the comparison of multiple independent datasets (XRF-derived chemistry-based, XRF-derived data-driven, XRD) collected on identical samples, this work highlights the strengths, limitations, and capabilities of different machine learning techniques for along-well estimation of mineralogical composition to assist with reservoir characterization.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-981-19-3866-5_36
Application of Artificial Intelligence for Failure Prediction of Engine Through Condition Monitoring Technique
  • Oct 4, 2022
  • Suvendu Mohanty + 1 more

Engine failure prediction, to date, has become more challenging for adequately diagnosing and assigning appropriate maintenance decision-making processes. This paper investigates the health of an engine through experimental observation using an artificial neural network (ANN). Lubricating oil analysis has been performed for diagnosing quantitative analysis, i.e. wear particle concentration (WPC), severity index (SI), wear severity index (WSI), and percentage of large particles (PLP). An ANN model using a nonlinear autoregressive with exogenous input (NARX) architecture has been employed to predict quantitative outputs. Finally, a data-driven approach by applying an artificial neural network to understand the system degradation from accumulated condition monitoring data is studied. Topology 3–18–4 from NARX (ANN) was optimal in developing a predictive failure model with regression coefficients (0.9985–0.9999), having an error autocorrelation factor bounded within 95% confidence limit and lowered MSE and MAPE values as 0.00093 and 3.56. The application of neural networks is increasingly attractive and seems to be the right choice for a data-driven diagnostic approach. In addition, the outcomes from the ANN data are validated with the experimental set so that the strength of the model is reflected and a pattern of failure from the historical monitoring of the operating engines is predicted.KeywordsCondition monitoringFailureDiagnosisNARXANN

  • PDF Download Icon
  • Preprint Article
  • 10.5194/egusphere-egu22-7530
Identifying relevant large-scale predictors for sub-seasonal precipitation forecast using explainable neural networks
  • Mar 28, 2022
  • Niclas Rieger + 6 more

<p>The last few years have seen an ever growing interest in weather predictions on sub-seasonal time scales ranging from 2 weeks to about 2 months. By forecasting aggregated weather statistics, such as weekly precipitation, it has indeed become possible to overcome the theoretical predictability limit of 2 weeks, bringing life to time scales which historically have been known as the “predictability desert”. The growing success at these time scales is largely due to the identification of weather and climate processes providing sub-seasonal predictability, such as the Madden-Julian Oscillation (MJO) and anomaly patterns of global sea surface temperature (SST), sea surface salinity, soil moisture and snow cover. Although much has been gained by these studies, a comprehensive analysis of potential predictors and their relative relevance to forecast sub-seasonal rainfall is still missing.</p><p> </p><p>At the same time, data-driven machine learning (ML) models have proved to be excellent candidates to tackle two common challenges in weather forecasting: (i) resolving the non-linear relationships inherent to the chaotic climate system and (ii) handling the steadily growing amounts of Earth observational data. Not surprisingly, a variety of studies have already displayed the potential of ML models to improve the state-of-the-art dynamical weather prediction models currently in use for sub-seasonal predictions, in particular for temperatures, precipitation and the MJO. It seems therefore inevitable that the future of sub-seasonal prediction lies in the combination of both the dynamical, process-based and the statistical, data-driven approach. </p><p> </p><p>In the advent of this new age of combined Neural Earth System Modeling, we want to provide insight and guidance for future studies (i) to what extent large-scale teleconnections on the sub-seasonal scale can be resolved by purely data-driven models and (ii) what the relative contributions of the individual large-scale predictors are to make a skillful forecast. To this end, we build neural networks to predict sub-seasonal precipitation based on a variety of large-scale predictors derived from oceanic, atmospheric and terrestrial sources. As a second step, we apply layer-wise relevance propagation to examine the relative importance of different climate modes and processes in skillful forecasts.</p><p> </p><p>Preliminary results show that the skill of our data-driven ML approach is comparable to state-of-the-art dynamical models suggesting that current operational models are able to correctly model large-scale teleconnections within the climate system. The ML model achieves highest skills over the tropical Pacific, the Maritime Continent and the Caribbean Sea, in agreement with dynamical models. By investigating the relative importance of those large-scale predictors for skillful predictions, we find that the MJO and processes associated with SST anomalies like the El Niño-Southern Oscillation, the Pacific decadal oscillation and the Atlantic meridional mode all play an important role for individual regions along the tropics.</p>

  • Research Article
  • Cite Count Icon 6
  • 10.1287/ijoo.2021.0066
Optimal Order Batching in Warehouse Management: A Data-Driven Robust Approach
  • Jan 7, 2022
  • INFORMS Journal on Optimization
  • Vedat Bayram + 3 more

Optimizing warehouse processes has direct impact on supply chain responsiveness, timely order fulfillment, and customer satisfaction. In this work, we focus on the picking process in warehouse management and study it from a data perspective. Using historical data from an industrial partner, we introduce, model, and study the robust order batching problem (ROBP) that groups orders into batches to minimize total order processing time accounting for uncertainty caused by system congestion and human behavior. We provide a generalizable, data-driven approach that overcomes warehouse-specific assumptions characterizing most of the work in the literature. We analyze historical data to understand the processes in the warehouse, to predict processing times, and to improve order processing. We introduce the ROBP and develop an efficient learning-based branch-and-price algorithm based on simultaneous column and row generation, embedded with alternative prediction models such as linear regression and random forest that predict processing time of a batch. We conduct extensive computational experiments to test the performance of the proposed approach and to derive managerial insights based on real data. The data-driven prescriptive analytics tool we propose achieves savings of seven to eight minutes per order, which translates into a 14.8% increase in daily picking operations capacity of the warehouse.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.engfailanal.2022.106884
Data-driven prediction approach for RC beam performance under low velocity impact loading
  • Oct 20, 2022
  • Engineering Failure Analysis
  • Jingfeng Zhang + 3 more

Data-driven prediction approach for RC beam performance under low velocity impact loading

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.earscirev.2024.104948
Displacement prediction of landslides at slope-scale: Review of physics-based and data-driven approaches
  • Oct 5, 2024
  • Earth-Science Reviews
  • Wenping Gong + 4 more

Displacement prediction of landslides at slope-scale: Review of physics-based and data-driven approaches

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.3390/app10165696
Partially versus Purely Data-Driven Approaches in SARS-CoV-2 Prediction
  • Aug 17, 2020
  • Applied Sciences
  • Samar A Shilbayeh + 2 more

Prediction models of coronavirus disease utilizing machine learning algorithms range from forecasting future suspect cases, predicting mortality rates, to building a pattern for country-specific pandemic end date. To predict the future suspect infection and death cases, we categorized the approaches found in the literature into: first, a purely data-driven approach, whose goal is to build a mathematical model that relates the data variables including outputs with inputs to detect general patterns. The discovered patterns can then be used to predict the future infected cases without any expert input. The second approach is partially data-driven; it uses historical data, but allows expert input such as the SIR epidemic algorithm. This approach assumes that the epidemic will end according to medical reasoning. In this paper, we compare the purely data-driven and partially-data driven approaches by applying them to data from three countries having different past pattern behavior. The countries are the US, Jordan, and Italy. It is found that those two prediction approaches yield significantly different results. Purely data-driven approach depends totally on the past behavior and does not show any decline in the number of the infected cases if the country did not experience any decline in the number of cases. On the other hand, a partially data-driven approach guarantees a timely decline of the infected curve to reach zero. Using the two approaches highlights the importance of human intervention in pandemic prediction to guide the learning process as opposed to the purely data-driven approach that predicts future cases based on the pattern detected in the data.

  • Research Article
  • Cite Count Icon 1
  • 10.1002/jcsm.13599
Feature Engineering for the Prediction of Scoliosis in 5q-Spinal Muscular Atrophy.
  • Dec 5, 2024
  • Journal of cachexia, sarcopenia and muscle
  • Tu-Lan Vu-Han + 9 more

5q-Spinal muscular atrophy (SMA) is now one of the 5% treatable rare diseases worldwide. As disease-modifying therapies alter disease progression and patient phenotypes, paediatricians and consulting disciplines face new unknowns in their treatment decisions. Conclusions made from historical patient data sets are now mostly limited, and new approaches are needed to ensure our continued best standard-of-care practices for this exceptional patient group. Here, we present a data-driven machine learning approach to a rare disease data set to predict spinal muscular atrophy (SMA)-associated scoliosis. We collected data from 84 genetically confirmed 5q-SMA patients who have received novel SMA therapies. We performed expert domain knowledge-directed feature engineering, correlation and predictive power score (PPS) analyses for feature selection. To test the predictive performance of the selected features, we trained a Random Forest Classifier and evaluated model performance using standard metrics. The SMA data set consisted of 1304 visits and over 360 variables. We performed feature engineering for variables related to 'interventions', 'devices', 'orthosis', 'ventilation', 'muscle contractures' and 'motor milestones'. Through correlation and PPS analysis paired with expert domain knowledge feature selection, we identified relevant features for scoliosis prediction in SMA that included disease progression markers: Hammersmith Functional Motor Scale Expanded 'HFMSE' (PPS = 0.27) and 6-Minute Walk Test '6MWT' scores (PPS = 0.44), 'age' (PPS = 0.41) and 'weight' (PPS = 0.49), 'contractures' (PPS = 0.17), the use of 'assistive devices' (PPS = 0.39, 'ventilation' (PPS = 0.16) and the presence of 'gastric tubes' (PPS = 0.35) in SMA patients. These features were validated using expert domain knowledge and used to train a Random Forest Classifier with an observed accuracy of 0.82 and an average receiver operating characteristic (ROC) area of 0.87. The introduction of disease-modifying SMA therapies, followed by the implementation of SMA in newborn screenings, has presented physicians with never-seen patients. We used feature engineering tools to overcome one of the main challenges when using data-driven approaches in rare disease data sets. Through predictive modelling of this data, we defined disease progression markers, which are easily assessed during patient visits and can help anticipate scoliosis onset. This highlights the importance of progressive features in the drug-induced revolution of this rare disease and further supports the ongoing efforts to update the SMA classification. We advocate for the consistent documentation of relevant progression markers, which will serve as a basis for data-driven models that physicians can use to update their best standard-of-care practices.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.jcrc.2016.01.021
Early prediction of extracorporeal membrane oxygenation eligibility for severe acute respiratory distress syndrome in adults
  • Jan 27, 2016
  • Journal of Critical Care
  • J Kyle Bohman + 7 more

Early prediction of extracorporeal membrane oxygenation eligibility for severe acute respiratory distress syndrome in adults

  • Preprint Article
  • 10.5194/ems2022-110
Identifying relevant large-scale predictors for sub-seasonal precipitation forecast using explainable neural networks
  • Jun 28, 2022
  • Niclas Rieger + 5 more

<p>The last few years have seen an ever growing interest in weather predictions on sub-seasonal time scales ranging from 2 weeks to about 2 months. By forecasting aggregated weather statistics, such as weekly precipitation, it has indeed become possible to overcome the theoretical predictability limit of 2 weeks, bringing life to time scales which historically have been known as the “predictability desert”. The growing success at these time scales is largely due to the identification of weather and climate processes providing sub-seasonal predictability, such as the Madden-Julian Oscillation (MJO) and anomaly patterns of global sea surface temperature (SST), sea surface salinity (SSS), soil moisture and snow cover. Although much has been gained by these studies, a comprehensive analysis of potential predictors and their relative relevance to forecast sub-seasonal rainfall is still missing.</p><p>At the same time, data-driven machine learning (ML) models have proved to be excellent candidates to tackle two common challenges in weather forecasting: (i) resolving the non-linear relationships inherent to the chaotic climate system and (ii) handling the steadily growing amounts of Earth observational data. Not surprisingly, a variety of studies have already displayed the potential of ML models to improve the state-of-the-art dynamical weather prediction models currently in use for sub-seasonal predictions, in particular for temperatures, precipitation and the MJO. It seems therefore inevitable that the future of sub-seasonal prediction lies in the combination of both the dynamical, process-based and the statistical, data-driven approach. </p><p>In the advent of this new age of combined Neural Earth System Modeling, we want to provide insight and guidance for future studies (i) to what extent large-scale teleconnections on the sub-seasonal scale can be resolved by purely data-driven models and (ii) what the relative contributions of the individual large-scale predictors are to make a skillful forecast. To this end, we build neural networks to predict sub-seasonal precipitation based on a variety of large-scale predictors derived from oceanic, atmospheric and terrestrial sources. As a second step, we apply layer-wise relevance propagation to examine the relative importance of different climate modes and processes in skillful forecasts.</p><p>Preliminary results show that the skill of our data-driven ML approach is comparable to state-of-the-art dynamical models suggesting that current operational models are able to correctly model large-scale teleconnections within the climate system. The ML model achieves highest skills over the tropical Pacific and the western part of North America. By investigating the relative importance of those large-scale predictors for skillful predictions, we find that the MJO and slow-varying processes associated with SST and SSS anomalies like the El Niño-Southern Oscillation, the Pacific decadal oscillation and the Atlantic meridional mode all play an important role for individual regions.</p>

  • Research Article
  • Cite Count Icon 143
  • 10.1016/j.apenergy.2014.03.084
Analysis of daily solar power prediction with data-driven approaches
  • Apr 23, 2014
  • Applied Energy
  • Huan Long + 2 more

Analysis of daily solar power prediction with data-driven approaches

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.