A novel hybrid ARIMA-LSTM model for maritime shipping stock forecasting: comparative evidence against statistical and machine learning benchmarks

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Purpose The sale and purchase of financial commodities has gained importance in recent years, leading to a growing and diverse interest in shipping companies’ investment. However, due to the shipping sector’s idiosyncratic features, investors and researchers constantly seek novel and more accurate forecasting techniques. Design/methodology/approach This research involves comparing a traditional econometric model, ARIMA, with decision tree regression, random forest model and an LSTM neural network model to predict stock prices for six maritime companies. Additionally, it proposes a hybrid ARIMA-LSTM model that leverages the strengths of both approaches. Findings The LSTM model outperforms the ARIMA model in all six cases, based on MAE, MSE and MAPE metrics. In addition, ARIMA dominates decision tree regression and random forest models, in the majority of cases. It also demonstrates the superiority of the proposed model. The study demonstrates that models can conduct highly accurate predictions of maritime stocks using only their past values despite their strong dependence on macroeconomic factors. Research limitations/implications The findings extend the application of LSTM models and artificial neural networks to the prediction of an asset type that has been underexplored in the literature. Additionally, the proposed model can be applied to the prediction of other types of stocks. Originality/value We highlight the type of connection between a maritime stock and its past values. Additionally, we propose a new hybrid model for predicting the daily prices of maritime stocks and a strategy for selecting a forecasting model based on user requirements for accuracy and preparation time.

Similar Papers
  • Research Article
  • Cite Count Icon 3
  • 10.38016/jista.922663
Estimation of High School Entrance Examination Success Rates Using Machine Learning and Beta Regression Models
  • Mar 15, 2022
  • Journal of Intelligent Systems: Theory and Applications
  • Tuba Koc + 1 more

Education is the foundation of economic, social, and cultural development for every individual and society as a whole. Students are accepted to secondary education institutions with the high school entrance examination made by the Ministry of National Education in Turkey. In this study, the success rates of the students who took the high school entrance examination in Turkey's 81 provinces in 2019 were handled with the machine learning regression and beta regression model. The present paper aimed to model, predict, and explain students' success rates using variables such as divorce rate, gross domestic product, illiteracy, and higher education populations. Support vector regression, random forest, decision tree, and beta regression model were applied to estimate success rates. Two models with the highest R2 value were found to be beta regression and random forest models. When the prediction errors of beta regression and random forest model were examined, it seemed to be that the random forest model is relatively superior to the beta regression model in predicting the success rates. While the beta regression model was the best predictor of the success rates of Çanakkale province, the random forest model predicted the success rates of Ankara well. Also, it was seen that the variables found to be significant in the beta regression model for success rates were also crucial in the random forest model. It is recommended to use both the beta and random forest models to estimate the students' success rates.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 34
  • 10.3390/rs13214372
Integration of a Crop Growth Model and Deep Learning Methods to Improve Satellite-Based Yield Estimation of Winter Wheat in Henan Province, China
  • Oct 30, 2021
  • Remote Sensing
  • Yi Xie + 1 more

Timely and accurate regional crop-yield estimates are crucial for guiding agronomic practices and policies to improve food security. In this study, a crop-growth model was integrated with time series of remotely sensed data through deep learning (DL) methods to improve the accuracy of regional wheat-yield estimations in Henan Province, China. Firstly, the time series of moderate-resolution imaging spectroradiometer (MODIS) normalized difference vegetation index (NDVI) were input into the long short-term memory network (LSTM) model to identify the wheat-growing region, which was further used to estimate wheat areas at the municipal and county levels. Then, the leaf area index (LAI) and grain-yield time series simulated by the Crop Environment REsource Synthesis for Wheat (CERES-Wheat) model were used to train and evaluate the LSTM, one-dimensional convolutional neural network (1-D CNN) and random forest (RF) models, respectively. Finally, an exponential model of the relationship between the field-measured LAI and MODIS NDVI was applied to obtain the regional LAI, which was input into the trained LSTM, 1-D CNN and RF models to estimate wheat yields within the wheat-growing region. The results showed that the linear correlations between the estimated wheat areas and the statistical areas were significant at both the municipal and county levels. The LSTM model provided more accurate estimates of wheat yields, with higher R2 values and lower root mean square error (RMSE) and mean relative error (MRE) values than the 1-D CNN and RF models. The LSTM model has an inherent advantage in capturing phenological information contained in the time series of the MODIS-derived LAI, which is important for satellite-based crop-yield estimates.

  • Research Article
  • 10.64252/2gc3dh43
Intelligent Techniques for Emotion Detection in Humans and Emotional States in Plants for Creating a Healthy Environment
  • May 23, 2025
  • International Journal of Environmental Sciences
  • Mukesh C Jain + 1 more

Emotion detection from text has gained significant attention in recent years due to its potential applications in various domains such as social media analysis, customer feedback analysis, and sentiment analysis. This research focuses on employing Natural Language Processing (NLP) techniques, including tokenizers and TF-IDF, along with different classifiers such as a hybrid model, LSTM model, and RF (Random Forest) model, for accurate emotion detection. The initial step involves data preprocessing, where tokenizers are utilized to break down the text into individual tokens or words, enabling further analysis. TF-IDF is then applied to assign weights to the tokens based on their frequency and importance in the document and across the corpus, respectively. This step helps identify the most significant words in the text data, allowing for a more focused analysis of emotions. Next, three different classifiers, namely a hybrid model, LSTM model, and RF model, are employed for emotion detection. The hybrid model combines the strengths of multiple ensemble models, including RF classifier, AdaBoost classifier, and Gradient Boosting classifier, using a voting classifier algorithm. The experimental findings provide strong evidence of the high accuracy achieved by both the hybrid model and LSTM model in detecting various emotions, such as happiness, sadness, fear, and anger. The hybrid model is also used to analyse psychological states in plants for creating a healthy environment. The hybrid model demonstrated exceptional performance, achieving an impressive testing accuracy rate of 94%, accompanied by precision and recall scores of 0.94 and 0.93, respectively. These results highlight the superior capability of these models in accurately classifying emotions from textual data. The robust performance of the hybrid model and LSTM model in emotion detection opens up numerous possibilities for their application in various fields. The ability to understand human emotions from text data can greatly inform decision-making processes in areas such as customer sentiment analysis, market research, social media monitoring, and psychological studies.

  • Research Article
  • Cite Count Icon 53
  • 10.1016/j.envpol.2022.119420
Spatiotemporal variations of air pollutants and ozone prediction using machine learning algorithms in the Beijing-Tianjin-Hebei region from 2014 to 2021
  • May 5, 2022
  • Environmental Pollution
  • Yan Lyu + 5 more

Spatiotemporal variations of air pollutants and ozone prediction using machine learning algorithms in the Beijing-Tianjin-Hebei region from 2014 to 2021

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.nepr.2025.104580
Identifying predictors of nursing dropout and attrition before and after Bachelor's Graduation based on the IPOD model: A machine learning approach.
  • Oct 1, 2025
  • Nurse education in practice
  • Mahdieh Arian + 4 more

Identifying predictors of nursing dropout and attrition before and after Bachelor's Graduation based on the IPOD model: A machine learning approach.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1038/s41598-024-53481-7
Analysis of the factors influencing moderate to poor performance status in patients with cancer after chemotherapy: a cross-sectional study comparing three models
  • Feb 9, 2024
  • Scientific Reports
  • Ke Xi + 7 more

There are no models for assessing the factors that determine moderate to poor performance status in patients with cancer after chemotherapy. This study investigated the influencing factors and identified the best model for predicting moderate–poor performance status. A convenience sampling method was used. Demographic and clinical data and evaluation results for fatigue, pain, quality of life and Eastern Cooperative Oncology Group status were collected three days after the end of chemotherapy. Decision tree, random forest and logistic regression models were constructed. Ninety-four subjects in the case group had moderate to poor performance status, and 365 subjects in the control group had no or mild activity disorders. The random forest model was the most accurate model. Physical function, total protein, general quality of life within one week before chemotherapy, hemoglobin, pain symptoms and globulin were the main factors. Total protein and hemoglobin levels reflect nutritional status, and globulin levels are an index of liver function. Therefore, physical function, nutritional status, general quality of life and pain symptoms within one week before chemotherapy and liver function can be used to predict moderate–poor performance status. Nurses should pay more attention to patients with poor physical function, poor nutritional status, lower quality of life and pain symptoms after chemotherapy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/jrfm14100481
A Novel Model Structured on Predictive Churn Methods in a Banking Organization
  • Oct 12, 2021
  • Journal of Risk and Financial Management
  • Leonardo José Silveira + 2 more

A constant in the business world is the frequent movement of customers joining or abandoning companies’ services and products. The customer is one of the company’s most important assets. Reducing the customer abandonment rate has become a matter of survival and, at the same time, the most efficient way to maintain the customer base, since the replacement of dropouts by new customers costs, on average, 40% more. Aiming to mitigate the churn (customer evasion) phenomenon, this study compared predictive models to discover the most efficient method to identify customers who tend to drop out in the context of a banking organization. A literature review of related works on the subject found the neural network, decision tree, random forest and logistic regression models were the most cited, and thus the models were chosen for this work. Quantitative analyses were carried out on a sample of 200,000 credit operations, with 497 explanatory variables. The statistical treatment of the data and the developments of predictive models of churn were performed using the Orange data mining software. The most expressive results were achieved using the random forest model, with an accuracy of 82%.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/cis58238.2022.00024
Prediction of PM2.S concentration in Changchun based on ensemble learning model
  • Dec 1, 2022
  • Yingjie Zhu + 2 more

With the continuous development of society, the air quality condition is declining day by day. This study focuses on the concentration of PM2.5. Taking Changchun City as the research object, using the data from 2014 to 2020 and decision tree, random forest, Adaboost, GBDT, KNN, XGBoost, LightgBM, CatBoost, SVR, Stacking, Blending, and multiple linear regression model were established respectively. The experimental results show that the concentration of PM2.5 in Changchun is high in spring and winter, while low in summer and autumn. PM2.5 concentration has a strong correlation with air quality data, but a weak correlation with meteorological data. The prediction model based on ensemble learning has a significant prediction effect on PM2.5 concentration, and the goodness of fit of all ensemble learning models on the test set reaches more than 0.92. The best prediction effect is based on the random forest model, LightGBM model and gradient lifting model in the first layer, and the Stacking model based on KNN algorithm in the second layer, The goodness of fit on the test set is above 0.94. From the aspect of feature importance, PMI0, CO, N02, SO2 and other air quality factors have a great influence on the PM2.5 prediction results in Changchun.

  • Research Article
  • Cite Count Icon 21
  • 10.1080/10106049.2020.1831626
Spatial modelling of accidents risk caused by driver drowsiness with data mining algorithms
  • Sep 23, 2021
  • Geocarto International
  • Farbod Farhangi + 3 more

Driver drowsiness causes many road accidents, and preparing a risk map of these accidents with spatial criteria and data mining algorithms highlights accident points well. In this study, accidents risk caused by driver drowsiness in Qazvin province, Iran, was modelled using decision tree (DT), random forest (RF) and support-vector regression (SVR) algorithms in GIS environment. Seven spatial criteria including road segment length, road width, slope angle, speed limit, land use/cover, distance to service area and distance to speed camera were selected as effective criteria in modelling. The effect of criteria in modelling was applied using a fuzzy method, and three risk maps were prepared. Evaluation with ROC-AUC showed that the AUC for RF, SVR and DT models were 0.904, 0.863 and 0.805, respectively, and the RF model overall had the best performance. Examining the importance of criteria showed that the speed limit was the most important criterion for modelling.

  • Research Article
  • 10.54254/2754-1169/86/20240936
Comparison of Random Forest and LSTM in Stock Prediction
  • May 28, 2024
  • Advances in Economics, Management and Political Sciences
  • Haoyuan Wu

As an integral component of the financial market, stock prices have attracted the attention of many investors. Due to the frequent fluctuations and sensitivity to market dynamics, predicting stock prices is challenging. The volatility of stock prices and potential significant differences across different periods add to the difficulty of forecasting and reduce its accuracy. The Random Forest model and the LSTM model, as representative models in decision trees and deep learning algorithms respectively, demonstrate high accuracy and adaptability in predicting stock prices. The paper will separately utilize the Random Forest model and the LSTM model to fit the S&P 500 price data from 2013 to 2018 (represented by Apple's stock prices) as training and testing sets, and then compare the fitting results of the two models. The conclusion is as follows: In the absence of white noise in the data, the Random Forest model demonstrates smaller biases in predicting data compared to the LSTM model, and it can also respond more swiftly to price fluctuations.

  • Research Article
  • Cite Count Icon 1
  • 10.54254/2754-1169/49/20230493
Forecasting Sector Rotation of A-share Market Using LSTM and Random Forest
  • Dec 1, 2023
  • Advances in Economics, Management and Political Sciences
  • Liwen Yin

To improve the efficacy of stock prediction strategies, researching sector rotation is essential. This study addresses the sector rotation problem in the A-share market and proposes an approach that leverages LSTM and random forest models to forecast sector rotation trends. Extensive evaluations are conducted to assess the models' prediction accuracy, comparing different evaluation indicators. The random search algorithm is employed to optimize model parameters, while the adaptive learning rate Adam algorithm is utilized to enhance convergence performance. The final experimental results demonstrate the remarkable accuracy of the LSTM model, achieving an impressive 88% accuracy in predicting sector rotation in the A-share market. Meanwhile, the random forest model achieves an accuracy of 86%. Furthermore, a combination of the bagging algorithm based on LSTM and random forest (LSTM-RF Bagging model) is employed for in-depth research, which exhibits even better performance with an accuracy of approximately 89%. The predictability of A-share market sector rotation is evident, and both LSTM and random forest models, along with new combination, prove to be suitable for forecasting. The findings in this paper serve as a valuable reference for investors, aiding them in making informed decisions regarding sector selection and asset allocation.

  • Research Article
  • Cite Count Icon 1
  • 10.2166/wst.2024.263
Novel optimized coupled rainfall model simulation based on stepwise decomposition technique.
  • Jul 31, 2024
  • Water science and technology : a journal of the International Association on Water Pollution Research
  • Zhiwen Zheng + 4 more

Precipitation forecasting plays a pivotal role in guiding the effective management of regional water resources and providing crucial warnings for regional droughts and floods. Finding a monthly precipitation simulation model with robust fitting performance is a significant research endeavor in practical precipitation prediction. This paper introduces two modified African vulture optimization algorithms (MAVOA1 and MAVOA2). It provides hyperparameter optimization techniques for the least squares support vector machine (LSSVM), long short-term memory neural network (LSTM), and random forest (RF) models. These techniques are used to construct a monthly precipitation simulation model based on algorithmic optimization coupled with variational mode decomposition for full decomposition. The test results at five typical stations in the North China Plain reveal the following: (1) the LSSVM model demonstrates significantly better performance than the LSTM and RF models. (2) the MAVOA2-LSSVM model has the best-integrated effect: the average test fitting error is RMSE = 17.50 mm/month, MRE = 117.25%, NSE = 0.90, which shows its superiority in practical application and can significantly improve the accuracy of precipitation prediction; MAVOA2 is more suitable for machine learning models with more hyperparameters of its own, which provides a reference for hyperparameter optimization algorithms in the other fields.

  • Conference Article
  • Cite Count Icon 17
  • 10.1109/itnec48623.2020.9084974
Short-Term Metro Passenger Flow Prediction Based on Random Forest and LSTM
  • May 5, 2020
  • Shaofu Lin + 1 more

Rapid and accurate short-term passenger flow prediction plays an important and far-reaching role in passenger flow control and early warning. In fact, the short-term passenger flow presents the characteristics of non-linearity and randomness. Traditional machine learning algorithms can hardly meet current predictions. In this paper, the random forest(RF) is used to calculate the feature importance to filter the extracted features and remove the redundant features, and we apply the Long Short-Term Memory network(LSTM) algorithm model to predict the short-term passenger flow of the metro. First, we calculate the out-of-bag(OOB) error of the features by RF based on the characteristics of bootstrap sampling and regression tree, and calculate OOB error again after adding noise. According to the two OOB errors, the feature importance can be obtained through related formulas, and some features can be filtered. RF can effectively reduce redundant features to participate in calculation and improve operating efficiency. Second, we apply the LSTM model to predict passenger flow every 10 minutes for each station and use the important features selected by RF as the model inputs. LSTM has an excellent effect in dealing with problems that are highly related to time series, and it is very suitable for prediction on time series issues. The proposed model is evaluated with real metro card data, the prediction performance compares to single RF, LSTM, and other algorithm models. The experiment results show that the accuracy of the RF combined LSTM model algorithm is better than that of other existing models such as RF and LSTM model. It shows good prediction accuracy and has far-reaching significance in the field of passenger flow prediction.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-981-19-8825-7_40
Ground Water Quality Index Prediction Using Random Forest Model
  • Jan 1, 2023
  • Veena Khandelwal + 1 more

The present work predicts and assesses the water quality index (WQI) that exhibits overall water quality levels using machine learning. The physiochemical parameters taken into account for the present work for drinking water quality index are pH, calcium, magnesium, sulphate, chloride, nitrate, fluoride, total hardness, total alkalinity, iron and sodium in mg/l. The physiochemical parameters for irrigation water quality index are electrical conductivity, residual sodium carbonate and SAR in mg/l. WQI is predicted from Yearly Ground Water Quality information from 01 January 2000 to 01 January 2018 using Central Ground Water Board (CGWB) data of Jaipur in the state Rajasthan, India. The data contains information from 118 Ground Water Points /Stations in Ganga Basin. Furthermore, IS-10500 (June 2015) and IS:11624-1986 (Reaffirmed 2001) limits are used for the calculating WQI for drinking and irrigation purposes, respectively. Decision tree regressor and regression random forest models were used for predicting water quality index. Water quality index is determined by the ground water physiochemical parameters. Random forest model outperformed decision tree model by achieving higher model accuracy with RMSE 10.92 and MAE 7.16.

  • Research Article
  • 10.54254/2755-2721/53/20241126
Random Forest model-based risk prediction of COVID-19 regional infection
  • Mar 28, 2024
  • Applied and Computational Engineering
  • Yang Li

The current prevalence of the COVID-19 pandemic worldwide has posed numerous challenges and questions. To assist governments, medical institutions, and the public in making informed decisions and minimize the risk of further spread of COVID-19, this paper employs the Random Forest model to predict the infection risk within certain regions. The dataset utilized underwent data cleaning and feature engineering, allowing predictions to be made using publicly accessible data such as local basic climate conditions. After conducting performance comparisons with other common machine learning models, including Linear Regression and Decision Tree Regressor, it was found that the Random Forest Regressor model exhibited superior performance across all evaluation metrics, with all error values below 0.05. Notably, the MAE for the Random Forest model was only 0.001089. This strongly suggests that the Random Forest model outperforms the other models used in this task.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon