Abstract

It has been proved that the modified form of ridge regularized linear models (MRRLMs) can get “very close” to identifying a subset of Markov boundary. However, it is assumed that the covariance matrix is non-singular, so MRRLMs cannot be applied to discover the Markov boundary (subset) from data sets when the covariance matrix is singular. The singularity of the covariance matrix means that there are some collinear variables in the data sets, and such data sets exist widely in the real world. In this paper, we present a novel variant of ridge regularized linear models (VRRLMs) to identify a subset of Markov boundary from data sets with collinear and non-collinear variables and, then, reveal the relationship between covariance matrix and collinearity of variables in the theory. In addition, we prove theoretically that the VRRLMs can identify a subset of Markov boundary under some reasonable assumptions and verify the theory on the four discrete data sets. The results show that VRRLMs outperform the MRRLMs in discovering a subset of Markov boundary on the data sets with collinear variables, while both of them have a similar discovery efficiency of the Markov boundary (subset) on the data sets with non-collinear variables.

Highlights

  • Discovering causal relationships among variables from observation data sets is fundamental to the discipline, such as computer science, medicine, statistics, economics and social science [1]–[4]

  • RELATE WORK Existing algorithms are generally divided into two categories: constraint and scoring, Constraint-based algorithms can be further divided into two categories: algorithms based on conditional independence test (ACIT) and algorithms based on topology structure information (ATSI) [16]

  • By calculating the eigenvalue of the covariance matrix of the data sets in advance, we find collinearity of variables in the data sets shown in the Table 4, repeat 10 times and calculate the average discovery efficiency of Markov boundary, and the results are shown in Table 4

Read more

Summary

INTRODUCTION

Discovering causal relationships among variables from observation data sets is fundamental to the discipline, such as computer science, medicine, statistics, economics and social science [1]–[4]. From Stanford University, first associated Markov blanket and feature selection, proved theoretically that a Markov blanket of Y on a Bayesian network is an optimal feature subset of variables for Y on the corresponding data set [15], while feature selection as one of the important pre-processing methods in the field of machine learning has promoted the application and development of Markov blanket theory tremendously. A new method based on regularized linear models was proposed to identify Markov blanket or boundary from data sets. There are only two articles related to regularized linear models used to discover Markov boundary or blanket. The third section briefly introduces the theoretical foundation of MRRLMs. In the fourth section, we reveal relationships between collinearity of variables and singularity of the corresponding covariance matrix, and propose VRRLMs and prove it theoretically under some assumptions. Section five shows the experimental results and analysis, and section six concludes

RELATE WORK
VARIANT RIDGE REGULARIZED LINEAR MODELS
SIMULATION
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call