Abstract

Selecting a proper model for a data set is a challenging task. In this article, an attempt was made to answer and to find a suitable model for a given data set. A general linear model (GLM) was introduced along with three different methods for estimating the parameters of the model. The three estimation methods considered in this paper were ordinary least squares (OLS), generalized least squares (GLS), and feasible generalized least squares (FGLS). In the case of GLS, two different weights were selected for improving the severity of heteroscedasticity and the proper weight (s) was deployed. The third weight was selected through the application of FGLS. Analyses showed that only two of the three weights including the FGLS were effective in improving or reducing the severity of heteroscedasticity. In addition, each data set was divided into Training, Validation, and Testing producing a more reliable set of estimates for the parameters in the model. Partitioning data is a relatively new approach is statistics borrowed from the field of machine learning. Stepwise and forward selection methods along with a number of statistics including the average square error testing (ASE), Adj. R-Sq, AIC, AICC, and ASE validate along with proper hierarchies were deployed to select a more appropriate model(s) for a given data set. Furthermore, the response variable in both data files was transformed using the Box-Cox method to meet the assumption of normality. Analysis showed that the logarithmic transformation solved this issue in a satisfactory manner. Since the issues of heteroscedasticity, model selection, and partitioning of data have not been addressed in fisheries, for introduction and demonstration purposes only, the 2015 and 2016 shrimp data in the Gulf of Mexico (GOM) were selected and the above methods were applied to these data sets. At the conclusion, some variations of the GLM were identified as possible leading candidates for the above data sets.

Highlights

  • Finding a suitable model for a given data set is a challenging method

  • Since the issues of heteroscedasticity, model selection, and partitioning of data have not been addressed in fisheries, for introduction and demonstration purposes only, the 2015 and 2016 shrimp data in the Gulf of Mexico (GOM) were selected and the above methods were applied to these data sets

  • Heteroscedasticity is a statistical term meaning that the variability of a response variable is unequal across the range of its predictor and it is quite common in fishery data sets

Read more

Summary

Introduction

Finding a suitable model for a given data set is a challenging method. The issue becomes more complex as the number of potential covariates increases. Heteroscedasticity is a statistical term meaning that the variability of a response variable is unequal across the range of its predictor and it is quite common in fishery data sets It is the result of violating other assumptions. Breusch and Pagan (1979) addressed this issue and developed a method known as the Lagrange Multiplier (LM) for testing the existence of heteroscedasticity in a data set. Like in an organization where each unit is responsible for a particular activity, in the case of having sufficiently large data records, the data set could be divided into two or three parts, each part responsible for a particular action This technique is used in some areas of machine learning where a portion of the data is used to train the system. The Validation portion was used for the purposes such as terminating the selection process or selecting the final model

Methodology
D W-Sq A-Sq
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call