Abstract
Linear regression is one of the oldest statistical modeling approaches. Still, it is a valuable tool, particularly when it is necessary to create forecast models with low sample sizes. When researchers use this method and have numerous potential regressors, choosing the group of regressors for a model that fulfills all the required assumptions can be challenging. In this sense, the authors developed an open-source Python script that automatically tests all the combinations of regressors under a brute-force approach. The output displays the best linear regression models, regarding the thresholds set by users for the required assumptions: statistical significance of the estimations, multicollinearity, error normality, and homoscedasticity. Further, the script allows the selection of linear regressions with regression coefficients according to the user’s expectations. This script was tested with an environmental dataset to predict surface water quality parameters based on landscape metrics and contaminant loads. Among millions of possible combinations, less than 0.1 % of the regressor combinations fulfilled the requirements. The resulting combinations were also tested in geographically weighted regression, with similar results to linear regression. The model's performance was higher for pH and total nitrate and lower for total alkalinity and electrical conductivity.•A Python script was developed to find the best linear regressions within a dataset.•Output regressions are automatically selected based on regression coefficient expectations set by the user and the linear regression assumptions.•The algorithm was successfully validated through an environmental dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.