Abstract

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.

Highlights

  • Extracting meaningful knowledge from large and nonlinearly-connected data structures is of primary importance for efficiently utilizing data

  • The duality between Laplacian matrices and probability distributions can be used for the purposes of statistical analyses and unsupervised machine learning

  • Laplacian mixture models recover the structure of interpolating cluster graphs exactly, converting the first K eigenvectors into binary conditional probabilities for belonging to each cluster

Read more

Summary

Introduction

Extracting meaningful knowledge from large and nonlinearly-connected data structures is of primary importance for efficiently utilizing data. The duality between Laplacian matrices and probability distributions can be used for the purposes of statistical analyses and unsupervised machine learning Their spectral decompositions provide data-dependent bases for describing patterns that represent global, nonhierarchical structures in the underlying graph. The component conditional probabilities fpkðxÞ 1⁄4 Pðk j xÞgmk1⁄41 can still be computed even if knowledge of f(x) the underlying mixture distribution is not available or is not required This makes the partition of unity form of mixture models (4) very useful in practice, since they apply even in cases where estimating f(x) is not relevant or available such as cluster analysis, graph partitioning, and domain decomposition. Rather than being dichotomous as implied by their names, soft and hard clustering approaches are complementary, and may be used together in one analysis to answer different questions about the same dataset

Materials and methods
Results and discussion
2: Initialize scalar Lmin 1 3
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.