Predictive analytics with music: Advancing tree-based models for song rating prediction

Jiaxuan Xu

doi:10.54254/2755-2721/52/20241655

Abstract

This paper presents an in-depth exploration of predictive analytics applied to a dataset of 19,485 songs from the Kaggle Predictive Analysis Competition (PAC), with the objective of forecasting song ratings based on auditory features. The study employs a range of tree-based models, including regression tree, bagging, and random forest, and confronts various data preprocessing challenges, particularly in handling missing data and incorporating genre as a significant predictive feature. Through the creation of dummy variables for genre classification and careful model selection, the research demonstrates an enhanced approach to predictive accuracy. The effectiveness of these models is rigorously evaluated using test Root Mean Square Error (RMSE), providing valuable insights into their predictive performance. This paper contributes to the field of music analytics by offering a comprehensive analysis of tree-based predictive models and their application in the nuanced task of song rating prediction, highlighting the importance of methodological refinement in predictive analytics.

Full Text