Abstract

In the analysis of real-world data, it is useful to learn a latent variable model that represents the data generation process. In this setting, latent tree models are useful because they are able to capture complex relationships while being easily interpretable. In this paper, we propose two incremental algorithms for learning forests of latent trees. Unlike current methods, the proposed algorithms are based on the variational Bayesian framework, which allows them to introduce uncertainty into the learning process and work with mixed data. The first algorithm, incremental learner , determines the forest structure and the cardinality of its latent variables in an iterative search process. The second algorithm, constrained incremental learner , modifies the previous method by considering only a subset of the most prominent structures in each step of the search. Although restricting each iteration to a fixed number of candidate models limits the search space, we demonstrate that the second algorithm returns almost identical results for a small fraction of the computational cost. We compare our algorithms with existing methods by conducting a comparative study using both discrete and continuous real-world data. In addition, we demonstrate the effectiveness of the proposed algorithms by applying them to data from the 2018 Spanish Living Conditions Survey. All code, data, and results are available at https://github.com/ferjorosa/incremental-latent-forests .

Highlights

  • Real-world data are often complex and high-dimensional

  • INCREMENTAL LEARNING OF LATENT FORESTS we propose a search-based method that hill-climbs the space of latent forests using the variational Bayes (VB) framework

  • Considering that directly searching this space requires the evaluation of a large number of candidate structures, a constrained variant that only evaluates the most prominent α candidates of each iteration is developed

Read more

Summary

Introduction

Real-world data are often complex and high-dimensional. In this setting, latent variables have proven to be useful in analyzing their generation process, as they are able to represent underlying data concepts by grouping observed variables. LTMs are appealing because can capture complex relationships with a simple tree structure that is interpretable. They allow for exact probabilistic inference in linear time [3]. For this reason, LTMs have proven to be valuable in many areas, such as classification [4]–[6], topic detection [7], probabilistic inference [8] and cluster analysis [9]–[13]

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.