Abstract

Nature create variables using its character component, and variables are sharing characters from a vary small to relatively large scale. This results, variables to have from a vary different to a more similar character, and leads to have a relation ship. Literature suggested different relation measures based on the nature of variable and type of relation ship exist. Today, due to having high variety of frequently produced large data size, currently suggested variable filtering and selection methods have gaps to full fill the need. This research desires to fill this gap by comparing literature suggested methods to finding out a better variable selection and dimension reduction methods. The result from regression analysis using all literature suggested factors shows that none of the predictors for development status of enterprise are significant, and only 10 predictors for number of employer in an enterprise are significant out of 81 factors. Since, variable selection and dimension reduction methods are applied to find out predictors of a response by removing variable redundancy, and complexity of incorporating large number variable. Based on statistical power, for the results from variable selection methods, specially association and correlation methods showed that, CANOVA more efficiently detects non-linear or non-monotonic correlation between a continuous–continuous and a continuous-categorical variables. Spearman’s correlation coefficient more efficiently detects a monotonic correlation between a continuous with a continuous, and a continuous with a categorical variable. Pearson correlation coefficient more efficiently detects the linear correlation between continuous variables. MIC efficiently detects non-linear or non-monotonic relation between continuous variables. Chi-square test of independence efficiently detects relation between a continuous with a continuous, and categorical with categorical variables, but the non linear or non monotonic relation between a continuous with a categorical are not well detected. On the other hand, the result from lasso and stepwise methods reveals that, the relation between the predictor and response due to interaction effect not detected by correlation and association methods are detected by stepwise variable selection method, and the multicollinearity is detected and removed by lasso method. Regressing the response variable “number of employer in an enterprise” based on variables selected by lasso and stepwise method does bring greater model fitness (based on adjusted R-squared value) than variables selected by association and correlation methods. Similarly, regressing the response variable “development status of an enterprise” based on variables selected by association and correlation methods does bring 12 significant variables, where none of variables are significant from variables selected by lasso and stepwise methods. As a result, 51 predictors for number of employment in an enterprise, and 40 predictors for development status of an enterprise are detected as significantly related variables. And, lasso and stepwise methods are preferred to select predictors of a continuous response variable “number of employers in an enterprise”, and association and correlation methods are preferred to select predictors of a categorical response variable “development status of an enterprise”. Finally, the reduced regression models result reveals that, 20 predictors have causal relation with number of employment in an enterprise, and 12 predictors have causal relation with development status of an enterprise. On the other hand, based on model fitness, information lost, and number of significant factors, principal factor is preferred and applied in dimension reduction for a categorical response variable “development status of an enterprise”, and factor score based regression is preferred and applied for a continuous response variable “number of employers in an enterprise”. However, the comparison of the results in variable selection and dimension reduction indicates that, variable selection methods gave more gain in model fitness than dimension reduction methods. Hence, the suggested variable selection methods are more preferred than dimension reduction methods, and applied to find out predictors. In general, the suggested procedure for variable selection methods are recommended when small number of variables are studied, and the suggested dimension reduction methods are recommended for large number of variant variables (Big data case).

Highlights

  • Nature create variables using its character component, and variables are sharing characters from a vary small to relatively large scale

  • The development status of an enterprise is directly significantly correlated with level of education, an enterprise with employer graduated from high school, collage or University

  • Regression analysis result using all literature suggested factors shows that none of the predictors for development status of an enterprise are significant, and only 10 predictors for the number of employer in an enterprise are significant out of 81 factors

Read more

Summary

Introduction

Nature create variables using its character component, and variables are sharing characters from a vary small to relatively large scale. Result and discussion Linear regression result for the number of employment using all 81 literature suggested factors showed in Appendix: Tables 12, 13, 14, 15, 16 and 17reveals that only 10 variables are significant (those are, h4, h3, IF4, IF8, Grouping, X15.29, X50.65, ed0, ed1, and emp_male ) with 0.9992 adjusted R-squared, and the result in Appendix: Tables 12, 13, 14, 15, 16 and 17 for logistic regression of development status of enterprise indicates none of the predictors are significant out of 81 factors To address this problem variable selection and dimension reduction methods are applied to find out the real predictors of a response by removing variable redundancy, and complexity of having large number of variable. This result suggested that, as the number of employer in an enterprise increase, employment by gender is proportional, employment by eduction category is significantly increased mainly employer with primary education is employed largely, and employment by age category

13 Grouping
Findings
Conclusion
67 SourceK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.