Abstract

In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.

Highlights

  • In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy

  • We reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables

  • Both simulation and real-world case studies showed that the proposed methodology effectively reveals the nonlinear interdependence structures among variables and outperforms traditional variable clustering algorithms such as hierarchical clustering

Read more

Summary

Introduction

In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. We develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology effectively reveals the nonlinear interdependence structures among variables and outperforms traditional variable clustering algorithms such as hierarchical clustering. In the 21st century, wireless sensing, electronic health records, and health Internet of Things are increasingly adopted to assist in the process of clinical decision making[2,3,4] This amount of information from multiple sources provides numerous variables for the contemplated predictive model.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.