Performance of multi-layer perceptron-neural network versus random forest regression for sea level rise prediction

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Performance of multi-layer perceptron-neural network versus random forest regression for sea level rise prediction

Similar Papers
  • Research Article
  • 10.5282/ubm/epub.73377
Diversity Forests: Using Split Sampling to Allow for Complex Split Procedures in Random Forest
  • Sep 8, 2020
  • Roman Hornung

Diversity forests are a class of random forest type prediction methods that modifies the split selection procedure of conventional random forests to allow for complex split procedures. While random forests show strong prediction performance when using conventional univariate, binary splitting, the procedure still has disadvantages. For example, interactions between features are not exploited effectively. The split selection procedure of diversity forests consists of choosing the best splits from sets of 'nsplits' candidate splits obtained by random selection from repeatedly sampled, specifically structured collections of splits. This makes complex split procedures computationally tangible while avoiding overfitting. This paper focuses on introducing diversity forests and evaluating its performance for univariate, binary splitting. Specific, complex split procedures will be the focus of future work. Using a collection of 220 real data sets with binary target variables, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that randomizing the split selection, as performed by diversity forests, leads to slight improvements in prediction performance and that this performance is quite robust with regard to the specified 'nsplits' value. These results indicate that diversity forests are well suited for realizing complex split procedures in random forests.

  • Research Article
  • 10.5281/zenodo.3247036
Predictive Models of Student Performance for Data-Driven Learning Analytics
  • Jan 1, 2019
  • Sean M Shiverick

Analytic tools are useful for detecting patterns in education data and providing insights about student performance and learning. This study compared six supervised learning algorithms (linear regression, ridge regression, the lasso, regression trees, random forests regression, gradient boosted regression) and identified features important for predicting student performance. The dataset consisted of N=1044 observations from two secondary schools in Portugal (UCI-MLR, Cortez & Silva, 2008). Performance was assessed by final grades (range: 0-20) in two courses, mathematics and Portugese. The models were fit to training data with 27 independent variables and evaluated on a testing subset. Overall, performance was lower for students in mathematics than Portugese. The models selected a similar set of variables as important for predicting performance: mother's education level, student plans for higher education, and weekly study time were positively related to predicted performance, whereas course subject, school educational support, and romantic relationships were associated with decreased student performance. The models differed in the number, weighting, order and importance given to predictor variables. Linear regression provided a model with 13 predictors. Ridge regression shrank the coefficient estimates toward zero; the lasso performed variables selection for a model with 20 predictors. There was a tradeoff between model complexity and interpretability. The single pruned regression tree provided a simple, interpretable non-linear model with four features. Random forests regression and gradient boosting reduced overfitting, but were more difficult to interpret. Advantages and limitations of the different models are discussed. Applications for educational data mining (EDM) and learning analytics (LA) are considered.

  • Research Article
  • Cite Count Icon 6
  • 10.1111/nyas.14015
New York City Panel on Climate Change 2019 Report Chapter 5: Mapping Climate Risk
  • Mar 1, 2019
  • Annals of the New York Academy of Sciences
  • Lesley Patrick + 4 more

New York City Panel on Climate Change 2019 Report Chapter 5: Mapping Climate Risk

  • Research Article
  • Cite Count Icon 3
  • 10.3964/j.issn.1000-0593(2018)01-0181-07
Black Soil Organic Matter Content Estimation Using Hybrid Selection Method Based on RF and GABPSO
  • Jan 1, 2018
  • Yue Ma + 3 more

To solve the problem of high-dimensional variables and characteristic wavelengths selection on soil organic matter content estimation using hyperspectral data, a hybrid feature selection method that combined random forest and self-adaptive searching method was proposed. In this hybrid method, random forest was employed to select spectral variables as the preliminary optimal dataset, which had great importance in the modeling process. The wrapper approach which combined genetic algorithm and binary particle swarm optimization was used as the self-adaptive searching algorithm to further search variables in the preliminary dataset. As for the prediction model, random forest was picked on because of the strong robustness and the excellent performance of dealing with high-dimensional variables. In this paper, the soil samples collected in the typical black soil region were used as the research object, and the Vis-NIR spectral data of the soil obtained from ASD spectrometer and the organic matter content through chemical analysis were used as the data sources. Following reflectance transformation and spectral resampling, the proposed hybrid selection method was employed to extract the characteristic spectral regions that were used as the input data for random forest. The prediction accuracy was compared with the results from random forest algorithm with the spectral datasets which were respectively extracted by no-selected method, only random forest method and only self-adaptive searching method. The results showed that using random forest model with the characteristic wavelengths extracted by proposed method obtained the highest predicted accuracy, in which the R-2, RMSE and the RPD were 0. 838, 0. 54% and 2. 534, respectively. Moreover, the proposed method was more efficient to selected features than other approaches. It can be concluded that the hybrid feature selection method and random forest algorithm can be effectively applied to black soil organic matter content estimation using hyperspectral data and it also provides a reference for solving the problem of variables selection and modeling on other types of soil organic matter content estimation.

  • Research Article
  • Cite Count Icon 75
  • 10.1016/j.oneear.2020.11.002
Twenty-first century sea-level rise could exceed IPCC projections for strong-warming futures
  • Dec 1, 2020
  • One Earth
  • Martin Siegert + 4 more

Twenty-first century sea-level rise could exceed IPCC projections for strong-warming futures

  • Research Article
  • 10.1111/nyas.12670
Appendix II: NPCC 2015 technical details.
  • Jan 1, 2015
  • Annals of the New York Academy of Sciences

Appendix II: NPCC 2015 technical details.

  • Research Article
  • Cite Count Icon 97
  • 10.2112/1551-5036(2004)020[0586:uslrpf]2.0.co;2
Using Sea Level Rise Projections for Urban Planning in Australia
  • Apr 1, 2004
  • Journal of Coastal Research
  • K J E Walsh + 6 more

This study deals with incorporating predictions of sea level rise into practical municipal planning. Predictions of global mean sea level rise can be made with more confidence than many other aspects of climate change science. The world has warmed in the past century, and as a result global mean sea level has risen and is expected to continue to rise. Even so, there are significant uncertainties regarding predictions of sea level. These arise from two main sources: the future amount of greenhouse gases in the atmosphere, and the ability of models to predict the impact of increasing concentrations of greenhouse gases. Current knowledge regarding the effect of global warming on sea level rise is reviewed. Global mean sea level is expected to rise by 3–30 cm by 2040, and 9–88 cm by 2100. An important remaining uncertainty is the future contribution of surface water storage (for example, lakes and reservoirs) to changes in sea level. In addition, there are also significant local sea level effects that need to be taken account in many regions of the globe, including isostatic and tectonic effects. The thermal expansion component of sea level rise is also likely to vary regionally, due to regional differences in the rate of downward mixing of heat and to changes in ocean currents. The current state of planning for sea level rise in Australia is reviewed. While not all coastal municipalities include sea level rise in their planning schemes, the recent adoption in a number of States of new planning schemes with statutory authority creates a changed planning environment for local government. Coastal urban planning needs to take sea level rise into account because its effects will be apparent during the typical replacement time of urban infrastructure such as buildings (before about 70 years). For local planning, ideally a risk assessment methodology may be employed to estimate the risk caused by sea level rise. In many locations, planning thresholds would also have to be considered in the light of possible changes in storm surge climatology due to changes in storm frequency and intensity, and (in some locations) changes to return periods of riverine flooding. In the medium term (decades), urban beaches will need beach re-nourishment and associated holding structures such as sea walls. Changes in storm and wave climatology are crucial factors for determining future coastal erosion.

  • Research Article
  • 10.14456/easr.2018.28
Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes
  • Sep 14, 2018
  • Engineering and Applied Science Research
  • Chaluemwut Noyunsan + 2 more

Supervised learning is a machine learning technique used for creating a data prediction model. This article focuses on finding high performance supervised learning algorithms with varied training data sizes, varied number of attributes, and time spent on prediction. This studied evaluated seven algorithms, Boosting, Random Forest, Bagging, Naive Bayes, K-Nearest Neighbours (K-NN), Decision Tree, and Support Vector Machine (SVM), on seven data sets that are the standard benchmark from University of California, Irvine (UCI) with two evaluation metrics and experimental settings of various training data sizes and missing key attributes. Our findings reveal that Bagging, Random Forest, and SVM are overall the three most accurate algorithms. However, when presence of key attribute values is of concern, K-NN is recommended as its performance is affected the least. Alternatively, when training data sizes may be not large enough, Naive Bayes is preferable since it is the most insensitive algorithm to training data sizes. The algorithms are characterized on a two-dimension chart based on prediction performance and computation time. This chart is expected to guide a novice user to choose an appropriate method for his/her demand. Based on this chart, in general, Bagging and Random Forest are the two most recommended algorithms because of their high performance and speed.

  • Research Article
  • Cite Count Icon 1
  • 10.14773/cst.2019.18.2.61
Modeling of Flow-Accelerated Corrosion using Machine Learning: Comparison between Random Forest and Non-linear Regression
  • Apr 30, 2019
  • Corrosion science and technology
  • Gyeong-Geun Lee + 9 more

Flow-Accelerated Corrosion (FAC) is a phenomenon in which a protective coating on a metal surface is dissolved by a flow of fluid in a metal pipe, leading to continuous wall-thinning. Recently, many countries have developed computer codes to manage FAC in power plants, and the FAC prediction model in these computer codes plays an important role in predictive performance. Herein, the FAC prediction model was developed by applying a machine learning method and the conventional nonlinear regression method. The random forest, a widely used machine learning technique in predictive modeling led to easy calculation of FAC tendency for five input variables: flow rate, temperature, pH, Cr content, and dissolved oxygen concentration. However, the model showed significant errors in some input conditions, and it was difficult to obtain proper regression results without using additional data points. In contrast, nonlinear regression analysis predicted robust estimation even with relatively insufficient data by assuming an empirical equation and the model showed better predictive power when the interaction between DO and pH was considered. The comparative analysis of this study is believed to provide important insights for developing a more sophisticated FAC prediction model.

  • Research Article
  • Cite Count Icon 2
  • 10.3233/shti200220
Comparison of Unplanned 30-Day Readmission Prediction Models, Based on Hospital Warehouse and Demographic Data.
  • Jan 1, 2020
  • Studies in health technology and informatics
  • Dhalluin Thibault + 6 more

Anticipating unplanned hospital readmission episodes is a safety and medico-economic issue. We compared statistics (Logistic Regression) and machine learning algorithms (Gradient Boosting, Random Forest, and Neural Network) for predicting the risk of all-cause, 30-day hospital readmission using data from the clinical data warehouse of Rennes and from other sources. The dataset included hospital stays based on the criteria of the French national methodology for the 30-day readmission rate (i.e., patients older than 18 years, geolocation, no iterative stays, and no hospitalization for palliative care), with a similar pre-processing for all algorithms. We calculated the area under the ROC curve (AUC) for 30-day readmission prediction by each model. In total, we included 259114 hospital stays, with a readmission rate of 8.8%. The AUC was 0.61 for the Logistic Regression, 0.69 for the Gradient Boosting, 0.69 for the Random Forest, and 0.62 for the Neural Network model. We obtained the best performance and reproducibility to predict readmissions with Random Forest, and found that the algorithms performed better when data came from different sources.

  • Dissertation
  • 10.11588/heidok.00014379
Active Learning: New Approaches, and Industrial Applications
  • Jan 24, 2013
  • Jens Röder

Active learning is one form of supervised machine learning. In supervised learning, a set of labeled samples is passed to a learning algorithm for training a classifier. However, labeling large amounts of training samples can be costly and error-prone. Active learning deals with the development of algorithms that interactively select a subset of the available unlabeled samples for labeling, and aims at minimizing the labeling effort while maintaining classification performance. The key challenge for the development of so-called active learning strategies is the balance between exploitation and exploration: On the one hand, the estimated decision boundary needs to be refined in feature space regions where it has already been established, while, on the other hand, the feature space needs to be scanned carefully for unexpected class distributions. In this thesis, two approaches to active learning are presented that consider these two aspects in a novel way. In order to lay the foundations for the first one, it is proposed to express the uncertainty in class prediction of a classifier at a test point in terms of a second-order distribution. The mean of this distribution corresponds to the common estimate of the posterior class probabilities and thus is related to the distance of the test point to the decision boundary, whereas the spread of the distribution indicates the degree of exploration in the corresponding region of feature space. This allows for the evaluation of the utility of labeling a yet unlabeled point with respect to classifier improvement in a principled way and leads to a completely novel approach to active learning. The proposed strategy is then implemented and evaluated based on kernel density classification. The generic active learning strategy can be combined with any other classifier, but it performs best if the derived second-order distributions are sufficiently good approximations to the sampling distribution. Although second-order distributions for random forests are derived in this thesis, they do not approximate sufficiently well the sampling distribution and mainly allow only for the relative comparison of prediction uncertainty between test points. In order to combine the state of the art classification performance of random forests with the principal ideas of the first active learning approach, a related second approach for random forests is derived. It is, in addition, tailored to the demands in industrial optical inspection: bag-wise labeling with weak labels and strongly imbalanced classes. Moreover, an outlier detection scheme based on random forests is derived that is used by the proposed active learning algorithm. Finally, a new computational scheme for Gaussian process classification is presented. It is compared to two standard methods in geostatistics, both with respect to theoretical consistency and practical performance. The method evolved as a by-product when considering using Gaussian process models for active learning.

  • Discussion
  • Cite Count Icon 23
  • 10.1088/1748-9326/7/2/021001
Sea-level rise: towards understanding local vulnerability
  • Apr 10, 2012
  • Environmental Research Letters
  • Stefan Rahmstorf

Sea-level rise: towards understanding local vulnerability

  • Dissertation
  • 10.5451/unibas-007110795
Machine learning for the prediction of drug-induced toxicity
  • Jan 1, 2019
  • Verena Schöning

The knowledge of toxicological properties of compounds (e.g. drugs, chemicals, and contaminants) is crucial for drug development, definition of toxicological thresholds and exposure limits. However, toxicological testing, either in vitro or in vivo, is time-consuming, labour intensive and expensive. An alternative to the classic experiments is the use of computational (in silico) approaches, such as machine learning. For machine learning, it is assumed that substances with comparable structure or molecular features also exhibit the comparable pharmacological or toxicological action. Based on the comparison of substances with known pharmacological or toxicological action to substances with unknown properties, models, which were generated using machine learning methods, are able to predict the action of the latter substances. The aim of this work was the development of predictive machine learning models for the estimation of risk of hepatotoxicity and genotoxicity. These models were then applied on two different substance groups and the outcome was compared to available literature data. The acute hepatotoxic potential of over 600 different pyrrolizidine alkaloids (PAs) was evaluated using the methods Random Forest and artificial Neural Networks. The predicted qualitative hepatotoxicity of both models was highly correlated. Furthermore, specific structural motives showed different hepatotoxic potential. Overall, the obtained results fitted well with already published in vitro and in vivo data on the acute hepatotoxic properties of PAs. The genotoxic/ mutagenic potential of PAs was addressed using six different machine learning methods (LAZAR (Lazy Structure-Activity Relationships), Support Vector Machines, Random Forest and two Deep Learning Networks). Even though the models achieved only low to moderate accuracy rates, the best model clearly showed structural specific differences in the predicted genotoxic potential. Furthermore, the acute hepatotoxic potential of 165 protein kinase inhibitors (PKIs) was predicted using Random Forest and artificial Neural Networks. The models confirmed clinical observations that PKIs have in general a high probability for inducing hepatotoxicity. However, interestingly, there seemed to be a target specific difference, with inhibitors of Janus kinases having the lowest hepatotoxic probability of 60-67%. The greatest challenge is the performance of the models. This has to be validated e.g. by cross-validation before the model can be used on the substances of interest. Although group statements could be easily obtained, due caution has to be taken while interpreting the results of predictive models for single compounds and if possible, comparison to already published data is advisable, as a form of external validation.

  • Dissertation
  • 10.24377/ljmu.t.00010791
A machine learning classification framework for early prediction of Alzheimer's disease
  • Jun 11, 2019
  • Mohamed Mahyoub

People today, in addition to their concerns about getting old and having to go through watching themselves grow weak and wrinkly, are facing an increasing fear of dementia. There are around 47 million people affected by dementia worldwide and the cost associated with providing them health and social care support is estimated to reach 2 trillion by 2030 which is almost equivalent to the 18th largest economy in the world. The most common form of dementia with the highest costs in health and social care is Alzheimer’s disease, which gradually kills neurons and causes patients to lose loving memories, the ability to recognise family members, childhood memories, and even the ability to follow simple instructions. Alzheimer’s disease is irreversible, unstoppable and has no known cure. Besides being a calamity to affected patients, it is a great financial burden on health providers. Health care providers also face a challenge in diagnosing the disease as current methods used to diagnose Alzheimer’s disease rely on manual evaluations of a patient’s medical history and mental examinations such as the Mini-Mental State Examination. These diagnostic methods often give a false diagnosis and were designed to identify Alzheimer’s after stage two when the part of all symptoms are evident. The problem is that clinicians are unable to stop or control the progress of Alzheimer’s disease, because of a lack of knowledge on the patterns that triggered the development of the disease. In this thesis, we explored and investigated Alzheimer’s disease from a computational perspective to uncover different risk factors and present a strategic framework called Early Prediction of Alzheimer’s Disease Framework (EPADf) that would give a future prediction of early-onset Alzheimer’s disease. Following extensive background research that resulted in the formalisation of the framework concept, prediction approaches, and the concept of ranking the risk factors based on clinical instinct, knowledge and experience using mathematical reasoning, we carried out experiments to get further insight and investigate the disease further using machine learning models. In this study, we used machine learning models and conducted two classification experiments for early prediction of Alzheimer’s disease, and one ranking experiment to rank its risk factors by importance. Besides these experiments, we also presented two logical approaches to search for patterns in an Alzheimer’s dataset, and a ranking algorithm to rank Alzheimer’s disease risk factors based on clinical evaluation. For the classification experiments we used five different Machine Learning models; Random Forest (RF), Random Oracle Model (ROM), a hybrid model combined of Levenberg-Marquardt neural network and Random Forest, combined using Fischer discriminate analysis (H2), Linear Neural Networks (LNN), and Multi-layer Perceptron Model (MLP). These models were deployed on a de-identified multivariable patient’s data, provided by the ADNI (Alzheimer’s disease Neuroimaging Initiative), to illustrate the effective use of data analysis to investigate Alzheimer’s disease biological and behavioural risk factors. We found that the continues enhancement of patient’s data and the use of combined machine learning models can provide an early cost-effective prediction of Alzheimer’s disease, and help in extracting insightful information on the risk factors of the disease. Based on this work and findings we have developed the strategic framework (EPADf) which is discussed in more depth in this thesis.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.3390/rs16030551
Assessment and Prediction of Sea Level and Coastal Wetland Changes in Small Islands Using Remote Sensing and Artificial Intelligence
  • Jan 31, 2024
  • Remote Sensing
  • Nawin Raj + 1 more

Pacific Island countries are vulnerable to the impacts of climate change, which include the risks of increased ocean temperatures, sea level rise and coastal wetland loss. The destruction of wetlands leads not only to a loss of carbon sequestration but also triggers the release of already sequestered carbon, in turn exacerbating global warming. These climate change effects are interrelated, and small island nations continuously need to develop adaptive and mitigative strategies to deal with them. However, accurate and reliable research is needed to know the extent of the climate change effects with future predictions. Hence, this study develops a new hybrid Convolutional Neural Network (CNN) Multi-Layer Bidirectional Long Short-Term Memory (BiLSTM) deep learning model with Multivariate Variational Mode Decomposition (MVMD) to predict the sea level for study sites in the Solomon Islands and Federated States of Micronesia (FSM). Three other artificial intelligence (AI) models (Random Forest (FR), multilinear regression (MLR) and multi-layer perceptron (MLP) are used to benchmark the CNN-BiLSTM model. In addition to this, remotely sensed satellite Landsat imagery data are also used to assess and predict coastal wetland changes using a Random Forest (RF) classification model in the two small Pacific Island states. The CNN-BiLSTM model was found to provide the most accurate predictions (with a correlation coefficient of >0.99), and similarly a high level of accuracy (>0.98) was achieved using a Random Forest (RF) model to detect wetlands in both study sites. The mean sea levels were found to have risen 6.0 ± 2.1 mm/year in the Solomon Islands and 7.2 ± 2.2 mm/year in the FSM over the past two decades. Coastal wetlands in general were found to have decreased in total area for both study sites. The Solomon Islands recorded a greater decline in coastal wetland between 2009 and 2022.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.