Establishing an efficient and reliable fuel consumption prediction model by using big data from ships facilitates optimized decision-making and ensures the green and intelligent development of ships. As data are crucial in model construction, this study presents a general multi-source data processing method for obtaining high-quality training data. A long short-term memory (LSTM) neural network, suitable for time-series data, was used to develop the fuel consumption black-box model. This was combined with the ship theory to establish a fuel consumption theoretical model, thereby generating the LSTM based gray-box model. We explored the impact of data diversity, quality, and quantity on black-box and gray-box models. Analysis of the navigation data of a passenger ship and meteorological data collected from the European Centre for Medium-Range Weather Forecasts (ECMWF) and MeteoBlue indicated that the combination of variables obtained via the feature selection of the least absolute shrinkage and selection operator (LASSO) statistical method yielded the best overall prediction performance. Moreover, the gray-box model was relatively stable in terms of the changes in effective variables. An analysis of data quality revealed that the systematic processing of outliers, which improves the accuracy of both models by 6.19% compared with direct deletion. Furthermore, the gray-box models use less amounts of data than the black-box models to achieve higher accuracy.
Read full abstract