Introduction: The nonlinear behaviour of activation functions is vital in Artificial Neural Networks (ANNs) for exploring the complex relationship between the input and output features. However, these are probably going to encounter vanishing gradient problems due to small gradients that lead to training instability, expensive exponent operations, and slow convergence. Objectives: The primary objective of this study is to develop Taylor expansion of the second order to realize the hyperbolic tangent and sigmoid functions. In particular, long short term memory network make extensive use of these functions as well as gating mechanism to control the flow of information and gradients. Both the custom functions can reduce the vanishing gradient issues in recurrent neural networks. Methods: Taylor expansion hyperbolic tangent and sigmoid activation functions based parallel heterogeneous Long Short Term Memory Network integrated with Bayesian hyperparameter Optimization is being proposed for coronavirus multi step time series prediction. a Min-Max Normalization is applied, which produces scaled data in the range (0, 1). The normalized dataset is partitioned into training and testing datasets, with 80% and 20%, respectively. Furthermore, both train and test datasets are prepared as input and target series using a window size of 5-7.The further proposed model is tuned with key hyperparameters such as the number of neurons, learning rate, dropout, and type of optimizer. The remaining model parameters are epochs, batch size, and loss, which are 200, 32, and mean square error, respectively. Results: The proposed model efficacy is evaluated on coronavirus daily cumulative cases, cumulative deaths, daily new cases, and total recovery cases in India. The Analysis reveals that the current model achieves remarkable performance in terms of Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of determination (R2 Score) when compared to existing models. Conclusions: The study reveals that the proposed framework with the Taylor approximation activation function produces more consistency in prediction than the default activation functions, including Tanh and sigmoid. In spite of that, gradients of Taylor Tanh and sigmoid activation function traits indicate a decline in the possibility of vanishing issue.
Read full abstract