Design a new real-time predictive model of viscosity of quench oil based on statistical learning method
High viscosity of quench oil is a critical problem of quench system in ethylene cracking furnaces in petrochemical plant, due to its influences on the safety and stability of equipments, meanwhile, the variety of viscosity of quench oil has a negative impact on yield of ethylene and other chemical products. This paper presents a new statistical learning model to forecast the real-time variety of quench oil viscosity based on statistical algorithm and machine learning method. Firstly, statistical algorithm is applied to reduce dimension of parameters, secondly, fitting real-time predictive model through machine learning method. The simulation results shows that this model can monitor the variety of viscosity per hour according to identified controllable parameters which are highly correlated with viscosity of quench oil.
2901
- 10.4135/9781412985130
- Jan 1, 1978
6978
- 10.1198/016214506000000735
- Dec 1, 2006
- Journal of the American Statistical Association
14
- 10.1016/j.cej.2015.10.093
- Nov 2, 2015
- Chemical Engineering Journal
50
- 10.1007/978-1-60761-580-4_14
- Dec 15, 2009
77
- 10.1016/j.jeconom.2015.09.004
- Oct 27, 2015
- Journal of Econometrics
88
- 10.1111/rssb.12108
- Feb 15, 2015
- Journal of the Royal Statistical Society. Series B, Statistical Methodology
15672
- 10.1007/bf00058655
- Aug 1, 1996
- Machine Learning
- Research Article
19
- 10.1186/s13244-023-01441-6
- May 18, 2023
- Insights into Imaging
ObjectivesThis study aimed to explore and develop artificial intelligence approaches for efficient classification of pulmonary nodules based on CT scans.Materials and methodsA number of 1007 nodules were obtained from 551 patients of LIDC-IDRI dataset. All nodules were cropped into 64 × 64 PNG images , and preprocessing was carried out to clean the image from surrounding non-nodular structure. In machine learning method, texture Haralick and local binary pattern features were extracted. Four features were selected using principal component analysis (PCA) algorithm before running classifiers. In deep learning, a simple CNN model was constructed and transfer learning was applied using VGG-16 and VGG-19, DenseNet-121 and DenseNet-169 and ResNet as pre-trained models with fine tuning.ResultsIn statistical machine learning method, the optimal AUROC was 0.885 ± 0.024 with random forest classifier and the best accuracy was 0.819 ± 0.016 with support vector machine. In deep learning, the best accuracy reached 90.39% with DenseNet-121 model and the best AUROC was 96.0%, 95.39% and 95.69% with simple CNN, VGG-16 and VGG-19, respectively. The best sensitivity reached 90.32% using DenseNet-169 and the best specificity attained was 93.65% when applying the DenseNet-121 and ResNet-152V2.ConclusionDeep learning methods with transfer learning showed several benefits over statistical learning in terms of nodule prediction performance and saving efforts and time in training large datasets. SVM and DenseNet-121 showed the best performance when compared with their counterparts. There is still more room for improvement, especially when more data can be trained and lesion volume is represented in 3D.Clinical relevance statementMachine learning methods offer unique opportunities and open new venues in clinical diagnosis of lung cancer. The deep learning approach has been more accurate than statistical learning methods. SVM and DenseNet-121 showed superior performance in pulmonary nodule classification.Graphical abstract
- Research Article
14
- 10.1016/j.ijmedinf.2020.104148
- May 13, 2020
- International Journal of Medical Informatics
Assessing reproducibility and veracity across machine learning techniques in biomedicine: A case study using TCGA data
- Preprint Article
- 10.5194/egusphere-egu21-2451
- Mar 3, 2021
<div> <div> <p>CO<sub>2</sub>-induced warming is approximately proportional to the total amount of CO<sub>2</sub> emitted. This emergent property of the climate system, known as the Transient Climate Response to cumulative CO<sub>2</sub> Emissions (TCRE), gave rise to the concept of a remaining carbon budget that specifies a cap on global CO<sub>2</sub> emissions in line with reaching a given temperature target, such as those in the Paris Agreement (e.g., Matthews et al. 2020). However, estimating the policy-relevant TCRE metric directly from the observation-based data products remains challenging due to non-CO<sub>2</sub> forcing and land-use change emissions present in the real-world climate conditions.</p> <p>Here, we present preliminary results for applying and comparing different statistical learning methods to determine TCRE (and later, remaining carbon budgets) from: (i) climate models’ output and (ii) the observational data products. First, we make use of a ‘perfect-model’ setting, i.e. using output from physics-based climate models (CMIP5 and CMIP6) under historical forcing (treated as pseudo-observations). This output is used to train different statistical-learning models, and to make predictions of TCRE (which are known from climate model simulations under CO<sub>2</sub>-only forcing, per experimental design). Next, we use such trained statistical learning models to make TCRE predictions directly from the observation-based data products.</p> <p>We also explore interpretability of the applied techniques, to determine where the statistical models are learning from, what the regions of importance are, and the key input features and weights. Explainable AI methods (e.g., McGovern et al. 2019; Molnar 2019; Samek et al. 2019) present a promising way forward in linking data-driven statistical and machine learning methods with traditional physical climate sciences, while leveraging from the large amount of data from the observational data products to provide more robust estimates of, often policy relevant, climate metrics.</p> <p>
- Supplementary Content
26
- 10.1097/md.0000000000029387
- Jun 24, 2022
- Medicine
Background:Adverse drug reactions (ADRs) are unintended negative drug-induced responses. Determining the association between drugs and ADRs is crucial, and several methods have been proposed to demonstrate this association. This systematic review aimed to examine the analytical tools by considering original articles that utilized statistical and machine learning methods for detecting ADRs.Methods:A systematic literature review was conducted based on articles published between 2015 and 2020. The keywords used were statistical, machine learning, and deep learning methods for detecting ADR signals. The study was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement (PRISMA) guidelines.Results:We reviewed 72 articles, of which 51 and 21 addressed statistical and machine learning methods, respectively. Electronic medical record (EMR) data were exclusively analyzed using the regression method. For FDA Adverse Event Reporting System (FAERS) data, components of the disproportionality method were preferable. DrugBank was the most used database for machine learning. Other methods accounted for the highest and supervised methods accounted for the second highest.Conclusions:Using the 72 main articles, this review provides guidelines on which databases are frequently utilized and which analysis methods can be connected. For statistical analysis, >90% of the cases were analyzed by disproportionate or regression analysis with each spontaneous reporting system (SRS) data or electronic medical record (EMR) data; for machine learning research, however, there was a strong tendency to analyze various data combinations. Only half of the DrugBank database was occupied, and the k-nearest neighbor method accounted for the greatest proportion.
- Research Article
7
- 10.3390/genes13081494
- Aug 21, 2022
- Genes
Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available. Therefore, many statistical machine learning methods have been proposed for this task. Multi-trait (MT) genomic prediction models take advantage of correlated traits to improve prediction accuracy. Therefore, some multivariate statistical machine learning methods are popular for GS. In this paper, we compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least squares (PLS) and the multi-trait random forest (RF) methods. Benchmarking was performed with six real datasets. We found that the three investigated methods produce similar results, but under predictors with genotype (G) and environment (E), that is, E + G, the MT GBLUP achieved superior performance, whereas under predictors E + G + genotype environment (GE) and G + GE, random forest achieved the best results. We also found that the best predictions were achieved under the predictors E + G and E + G + GE. Here, we also provide the R code for the implementation of these three statistical machine learning methods in the sparse kernel method (SKM) library, which offers not only options for single-trait prediction with various statistical machine learning methods but also some options for MT predictions that can help to capture improved complex patterns in datasets that are common in genomic selection.
- Dissertation
1
- 10.31274/rtd-180813-17030
- Apr 29, 2015
In this PhD project, several related research topics are pursued. These projects include data mining of coarse-grained side chain orientation in the protein data bank and the prediction of such orientation for each individual residue using statistical learning methods, the motions of protein and protein complexes using the elastic network model and statistical methods and clustering of structures within an ensemble of NMR-derived protein structures. The first research topic is about the side chain orientation in protein structures. A coarse-grained measurement for side chain orientation is used, and the relationship between this type of side chain orientation measurement and the hydrophobicity of residue type is established. Along with the research on the side chain orientation, visualization software to visualize this coarse-grained side chain orientation is developed using openGL and C++ language. In addition, several predictive models for side chain orientation of individual residues are constructed using several statistical machine learning methods (General linear regression, Regression tree, Bagging of regression tree, Neural Network and Support Vector Machine). The second topic is about the dynamics of protein and protein complexes using the elastic network model. In this part, the effects of different superposition methods on the correspondence between the experimental conformational changes extracted from the cluster of structures using principal component analysis and the normal modes are studied, and we obtain a better correspondence for some protein structures using the maximum likelihood based superposition method. In addition, we also apply the elastic network model to study the dynamics of the small ribosomal subunit. In this project, we perform a series of protein subunit removal computational experiments and study the effect of removing some protein subunits on the motion of the partial 30S structures simulated with the elastic network model. Through these studies, we find that S6 interacts with S18 in the small ribosomal subunit, which is consistent with the previous computation and experimental results from other researchers. xiv Another project is the application of principal component shaving method for clustering structures in an ensemble of NMR-derived protein structures. Principal component shaving is often used to find the similar gene expression pattern in microarray experiment, and this method is applied to cluster similar structures in an ensemble of NMR-derived protein structures. The results show that similar structures can be clustered together by using this method. For this PhD project, the results from coarse-grained side chain orientation and prediction for side chain orientation for each residue are already published. I was the first author for these two papers. For the study of the effects of different superposition methods on the correspondence between the experimental conformational changes from principal component analysis and the normal modes, the application of ANM in 30S subunit and the application of the principal component shaving for clustering structures for an ensemble of NMR-derived, we will submit our papers soon.
- Research Article
104
- 10.1016/j.tust.2020.103699
- Dec 21, 2020
- Tunnelling and Underground Space Technology
Prediction of tunnel boring machine operating parameters using various machine learning algorithms
- Research Article
62
- 10.3168/jds.2020-19576
- Apr 15, 2021
- Journal of Dairy Science
Predicting cow milk quality traits from routinely available milk spectra using statistical machine learning methods
- Research Article
- 10.1080/02664763.2024.2315451
- Feb 13, 2024
- Journal of Applied Statistics
Computational Medicine encompasses the application of Statistical Machine Learning and Artificial Intelligence methods on several traditional medical approaches, including biochemical testing which is extremely valuable both for early disease prognosis and long-term individual monitoring, as it can provide important information about a person's health status. However, using Statistical Machine Learning and Artificial Intelligence algorithms to analyze biochemical test data from Electronic Health Records requires several preparatory steps, such as data manipulation and standardization. This study presents a novel approach for utilizing Electronic Health Records from large, real-world databases to develop predictive precision medicine models by exploiting Artificial Intelligence. Furthermore, to demonstrate the effectiveness of this approach, we compare the performance of various traditional Statistical Machine Learning and Deep Learning algorithms in predicting individuals' future biochemical test outcomes. Specifically, using data from a large real-world database, we exploit a longitudinal format of the data in order to predict the future values of 15 biochemical tests and identify individuals at high risk. The proposed approach and the extensive model comparison contribute to the personalized approach that modern medicine aims to achieve.
- Book Chapter
7
- 10.1016/b978-0-12-821838-9.00005-0
- Jan 1, 2021
- Mathematical Modelling of Contemporary Electricity Markets
Chapter 4 - Forecasting week-ahead hourly electricity prices in Belgium with statistical and machine learning methods
- Research Article
- 10.5445/ir/1000134512
- Jan 1, 2020
Time series forecasting is a crucial task in various fields of business and science. There are two coexisting approaches to time series forecasting, which are statistical methods and machine learning methods. Both come with different strengths and limitations. Statistical methods such as the Holt-Winters’ Method or ARIMA have been practiced for decades. They stand out due to their robustness and flexibility. Furthermore, these methods work well when few data is available and can exploit a priori knowledge. However, statistical methods assume linear relationships in the data, which is not necessarily the case in real-world data, inhibiting forecasting performance. On the other hand, machine learning methods such as Multilayer Perceptrons or Long Short-Term Memory Networks do not have the assumption of linearity and have the exceptional advantage of universally approximating almost any function. In addition to that, machine learning methods can exploit cross-series information to enhance an individual forecast. Besides these strengths, machine learning methods face several limitations in terms of data and computation requirements. Hybrid methods promise to advance time series forecasting by combining the best of statistical and machine learning methods. The fundamental idea is that the combination compensates for the limitations of one approach with the strengths of the other. This thesis shows that the combination of a Holt-Winters’ Method and a Long Short-Term Memory Network is promising when the periodicity of a time series can be precisely specified. The precise specification enables the Holt-Winters’ Method to simplify the forecasting task for the Long Short-Term Memory Network and, consequently, facilitates the hybrid method to obtain accurate forecasts. The research question to be answered is which characteristics of a time series determine the superiority of either statistical, machine learning, or hybrid approaches. The result of the conducted experiment shows that this research question can not be answered generally. Nevertheless, the results propose findings for specific forecasting methods. The Holt-Winters’ Method provides reliable forecasts when the periodicity can be precisely determined. ARIMA, however, handles overlying seasonalities better than the Holt-Winters’ Method due to its autoregressive approach. Furthermore, the results suggest the hypothesis that machine learning methods have difficulties extrapolating time series with trend. Finally, the Multilayer Perceptron can conduct accurate forecasts for various time series despite its simplicity, and the Long Short-Term Memory Network proves that it needs relevant datasets of adequate length to conduct accurate forecasts.
- Research Article
16
- 10.1038/s41612-023-00536-7
- Dec 20, 2023
- npj Climate and Atmospheric Science
Traditional statistical methods (TSM) and machine learning (ML) methods have been widely used to separate the effects of emissions and meteorology on air pollutant concentrations, while their performance compared to the chemistry transport model has been less fully investigated. Using the Community Multiscale Air Quality Model (CMAQ) as a reference, a series of experiments was conducted to comprehensively investigate the performance of TSM (e.g., multiple linear regression and Kolmogorov–Zurbenko filter) and ML (e.g., random forest and extreme gradient boosting) approaches in quantifying the effects of emissions and meteorology on the trends of fine particulate matter (PM2.5) during 2013−2017. Model performance evaluation metrics suggested that the TSM and ML methods can explain the variations of PM2.5 with the highest performance from ML. The trends of PM2.5 showed insignificant differences (p > 0.05) for both the emission-related (PM2.5EMI\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$${{\\rm{PM}}}_{2.5}^{{\\rm{EMI}}}$$\\end{document}) and meteorology-related components between TSM, ML, and CMAQ modeling results. PM2.5EMI\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$${{\\rm{PM}}}_{2.5}^{{\\rm{EMI}}}$$\\end{document} estimated from ML showed the least difference to that from CMAQ. Considering the medium computing resources and low model biases, the ML method is recommended for weather normalization of PM2.5. Sensitivity analysis further suggested that the ML model with optimized hyperparameters and the exclusion of temporal variables in weather normalization can further produce reasonable results in emission-related trends of PM2.5.
- Research Article
3
- 10.1111/stan.12326
- Nov 3, 2023
- Statistica Neerlandica
One of the main objectives of the time series analysis is forecasting, so both Machine Learning methods and statistical methods have been proposed in the literature. In this study, we compare the forecasting performance of some of these approaches. In addition to traditional forecasting methods, which are the Naive and Seasonal Naive Methods, S/ARIMA, Exponential Smoothing, TBATS, Bayesian Exponential Smoothing Models with Trend Modifications and STL Decomposition, the forecasts are also obtained using seven different machine learning methods, which are Random Forest, Support Vector Regression, XGBoosting, BNN, RNN, LSTM, and FFNN, and the hybridization of both statistical time series and machine learning methods. The data set is selected proportionally from various time domains in M4 Competition data set. Thereby, we aim to create a forecasting guide by considering different preprocessing approaches, methods, and data sets having various time domains. After the experiment, the performance and impact of all methods are discussed. Therefore, most of the best models are mainly selected from machine learning methods for forecasting. Moreover, the forecasting performance of the model is affected by both the time frequency and forecast horizon. Lastly, the study suggests that the hybrid approach is not always the best model for forecasting. Hence, this study provides guidelines to understand which method will perform better at different time series frequencies.
- Research Article
1
- 10.5812/amh-121764
- May 19, 2022
- Annals of Military and Health Sciences Research
Background: This study aimed to investigate the oral health presentations of coronavirus disease 2019 (COVID-19) inpatients using statistical analysis and machine learning methods before infection, during hospitalization, and after discharge from the hospital. Methods: This cross-sectional study was conducted on 140 hospitalized COVID-19 patients with reverse transcription-polymerase chain reaction diagnosis and severe symptoms. Demographic data, clinical characteristics, oral health habits, and oral manifestations in three periods (i.e., before infection, during hospitalization, and after discharge from the hospital) were recorded through a questionnaire and oral examination. Statistical analysis and machine learning methods were used for the analysis of patients’ data. Results: Xerostomia, dysgeusia, hypogeusia, halitosis, and a metallic taste were the most frequent oral symptoms during hospitalization, with the incidence of 68.6%, 51.4%, 49.3%, 31.4%, and 29.3% in patients, respectively. Using tobacco significantly increased the incidence of xerostomia, dysgeusia, hypogeusia, halitosis, and a metallic taste during hospitalization (P = 0.011, P = 0.001, P = 0.002, P = 0.0001, and P = 0.0001, respectively). Smoking led to increasing dysgeusia, hypogeusia, halitosis, and a metallic taste during hospitalization (P = 0.019, P = 0.014, P = 0.013, and P = 0.006, respectively). The micro-average receiver operating characteristic (ROC) curve analysis revealed that the machine learning logistic regression model achieved the highest area under the ROC curve with a value of 0.83. Conclusions: Xerostomia and dysgeusia are the most common oral symptoms of COVID-19 patients and could be used to predict COVID-19 infection. Dysgeusia correlates with xerostomia, and it is hypothesized that xerostomia is an etiologic factor for dysgeusia. The early detection of COVID-19 can help reduce the enormous burden on healthcare systems, and machine learning is advantageous for this purpose.
- Research Article
1
- 10.7176/jstr/5-3-19
- Mar 1, 2019
- International Journal of Scientific and Technological Research
Demand forecasting is important for planning future of a seaport facility. In this paper, different methods are compared for demand forecasting problem of a seaport in Turkey. Three types of data (general cargo, container, vehicle) were collected from the period of 2012-2017. Using machine learning for demand forecasting was found to be an important missing link in earlier studies and it was observed that the studies about demand forecasting on container terminals is a lot more than the studies on maritime terminals. Statistical forecasting methods and machine learning methods are applied for all types of data to determine the best estimation method and forecast the handling volumes for the next two years. The comparison of the forecasting performances of statistical forecasting methods and machine learning methods have been comparatively analysed. According to chosen accuracy measures, Multiplicative Holt Winter’s was recognized as the best forecasting method for container and vehicle handling volumes, whereas machine learning method ensured the best forecasting values for the general cargo. Keywords : Demand forecasting, seaport, machine learning, statistical modeling DOI : 10.7176/JSTR/5-3-19
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.