Carefully selecting variables in problems with large volumes of data are extremely important, as it reduces the complexity of the model, improves the interpretation of the results, and increases computational efficiency, ensuring more accurate and relevant analyses. This paper presents a comprehensive approach to selecting variables in multiple regression models using the stepwise procedure. As the main contribution of this study, we present the stepwise function implemented in Python to improve the effectiveness of statistical analyses, allowing the intuitive and efficient selection of statistically significant variables. The application of the function is exemplified in a real case study of real estate pricing, validating its effectiveness in improving the fit of regression models. In addition, we presented a methodological framework for treating joint problems in data analysis, such as heteroskedasticity, multicollinearity, and nonadherence of residues to normality. This framework offers a robust computational implementation to mitigate such issues. This study aims to advance the understanding and application of statistical methods in Python, providing valuable tools for researchers, students, and professionals from various areas.
Read full abstract