The article focuses on developing a predictive product pricing model using LightGBM. Also, the goal was to adapt the LightGBM method for regression problems and, especially, in the problems of forecasting the price of a product without history, that is, with a cold start.The article contains the necessary concepts to understand the working principles of the light gradient boosting machine, such as decision trees, boosting, random forests, gradient descent, GBM (Gradient Boosting Machine), GBDT (Gradient Boosting Decision Trees). The article provides detailed insights into the algorithms used for identifying split points, with a focus on the histogram-based approach.LightGBM enhances the gradient boosting algorithm by introducing an automated feature selection mechanism and giving special attention to boosting instances characterized by more substantial gradients. This can lead to significantly faster training and improved prediction performance. The Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) techniques used as enhancements to LightGBM are vividly described. The article presents the algorithms for both techniques and the complete LightGBM algorithm.This work contains an experimental result. To test the lightGBM, a real dataset of one Japanese C2C marketplace from the Kaggle site was taken. In the practical part, a performance comparison between LightGBM and XGBoost (Extreme Gradient Boosting Machine) was performed. As a result, only a slight increase in estimation performance (RMSE, MAE, R-squard) was found by applying LightGBM over XGBoost, however, there exists a notable contrast in the training procedure’s time efficiency. LightGBM exhibits an almost threefold increase in speed compared to XGBoost, making it a superior choice for handling extensive datasets.This article is dedicated to the development and implementation of machine learning models for product pricing using LightGBM. The incorporation of automatic feature selection, a focus on highgradient examples, and techniques like GOSS and EFB demonstrate the model’s versatility and efficiency. Such predictive models will help companies improve their pricing models for a new product. The speed of obtaining a forecast for each element of the database is extremely relevant at a time of rapid data accumulation.
Read full abstract