In today's world, music plays an important role in the lives of millions of people, and music streaming platforms such as Spotify have become an integral part of modern culture. The popularity of music tracks is of great importance to the music industry, affecting artists' incomes and trends in the music world. Predicting the popularity of music tracks is an impor-tant task that can help artists, producers, and platforms better understand listener preferences and optimize their strategies. As part of this work, a data storage of music tracks on the Spotify platform has been de-veloped, based on a physical model of the database, the functionality of which is implemented using SQL scripts. Working with the database is presented through the implementation of software for the implementation of ETL processes and intelligent analysis of selected data. The software allows you to classify tracks by the level of popularity (0 - not at all popular, 1 - medium popularity, 2 - hit) using numerical track metrics such as acousticness, tempo, va-lence, liveness, etc. The role of the data storage management system is SQLite, the program-ming language for implementing the application is Python. Different machine learning models are used to predict track popularity, including KNeighbors, Decision Tree, Random Forest, and Extreme Gradient Boosting. Data mining software provides efficient track classification and graphical display, allowing users to easily interpret forecasting results. Libraries used in the work: pandas, numpy, seaborn, matplotlib, tabulate, xgboost, scipy, sqlite3. The overall analysis showed that the XGBoost and Random Forest models are the most effective for predicting the popularity of music tracks. They demonstrate high accuracy and resistance to changes in the set of attributes, which makes them suitable for use in real condi-tions.
Read full abstract