Abstract

Model selection is the process of choosing a model from a set of possible models. The model's ability to generalise means it can fit both current and future data. Despite numerous emergences of procedures in selecting models automatically, there has been a lack of studies on procedures in selecting multiple equations models, particularly seemingly unrelated regression equations (SURE) models. Hence, this study concentrates on an automated model selection procedure for the SURE model by integrating the expectation-maximization (EM) algorithm estimation method, named SURE(EM)-Autometrics. This extension procedure was originally initiated from Autometrics, which is only applicable for a single equation. To assess the performance of SURE(EM)-Autometrics, simulation analysis was conducted under two strengths of correlation among equations and two levels of significance for a two-equation model with up to 18 variables in the initial general unrestricted model (GUM). Three econometric models have been utilised as a testbed for true specification search. The results were divided into four categories where a tight significance level of 1% had contributed a high percentage of all equations in the model containing variables precisely comparable to the true specifications. Then, an empirical comparison of four model selection techniques was conducted using water quality index (WQI) data. System selection to select all equations in the model simultaneously proved to be more efficient than single equation selection. SURE(EM)-Autometrics dominated the comparison by being at the top of the rankings for most of the error measures. Hence, the integration of EM algorithm estimation is appropriate in improving the performance of automated model selection procedures for multiple equations models.

Highlights

  • To make scientific discoveries or anticipate future outcomes, data analysts use a variety of statistical models and methodologies to analyse observable data

  • Because the Ordinary least squares (OLS) ignores the correlations of the disturbances, it is commonly accepted that the generalised least squares (GLS) estimate technique is more efficient

  • This simulation started with all water quality index (WQI) data available in hand

Read more

Summary

Introduction

To make scientific discoveries or anticipate future outcomes, data analysts use a variety of statistical models and methodologies to analyse observable data. Regardless of the data and fitting processes used, selecting the best acceptable model or approach from a pool of candidates is an important step. An essential part of data analysis for scientific investigations is model selection, which is essential for obtaining accurate statistical inferences or predictions [1], [2]. Afterwards, the results of hypothesis tests of the single parameters are used to identify significant variables or to conduct diagnostic checking for the assumptions of Mathematics and Statistics 10(1): 222-232, 2022 the model [3]. This entire procedure may be done automatically or manually. Repetitive manual retraining and recalibrating of models are frequently prohibitively expensive, time-consuming, and in some circumstances impossible [4]

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call