The new e-commerce field has attracted businesses of all sizes, retailers, and individuals. Consequently, there is an ongoing necessity for applications that can offer predictions on trending products and optimal selling time. This research suggests aiding businesses in forecasting demand for various product categories by employing data mining algorithms on multivariate time series data. To ensure the most recent information, real-time data was gathered through APIs to build the first block in this research. While search volume was derived from the Keywords Everywhere tool, Amazon's search volume was derived from the Helium 10 tool and external features about actual purchased data. The harvested raw datasets went through multiple processes to generate the dataset and were validated. The models XGBoost, Linear Regression, Random Forest, long-short-term memory, and K-nearest neighbor were employed to predict the trends, and the performance is demonstrated using evaluation metrics, namely Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R2). Overall, Linear Regression outperformed, especially at a correlation coefficient of 0.9, with R2 = 90.688, MAE = 0.038, MSE = 0.003, and RMSE = 0.057. KNN outperformed on correlation coefficient of 0.7, R2 = 85.129, MAE = 0.045, MSE = 0.005, and RMSE = 0.068. XGBoost produced the best results with a correlation coefficient of 0.9, yielding R2 = 85.89, MAE = 0.042, MSE = 0.004, and RMSE = 0.062. Random Forest, on the other hand, achieves peak metrics with a correlation coefficient of 0.6, R2 = 84.854, MAE = 0.041, MSE = 0.004, and RMSE = 0.066.
Read full abstract