Abstract
Simplicial complexes are higher-order combinatorial structures which have been used to represent real-world complex systems. In this paper, we focus on the local patterns in simplicial complexes called simplets, a generalization of graphlets. We study the problem of counting simplets of a given size in a given simplicial complex. For this problem, we extend a sampling algorithm based on color coding, from graphs to simplicial complexes, with essential technical novelty. We theoretically analyze our proposed algorithm named SC3, showing its correctness, unbiasedness, convergence, and time/space complexity. Through extensive experiments on sixteen real-world datasets, we show the superiority of SC3 in terms of accuracy, speed, and scalability, compared to the baseline methods. We use the counts given by SC3 for simplicial complex analysis, especially for characterization, which is further used for simplicial complex clustering, where SC3 shows a strong ability of characterization with domain-based similarity. Additionally, we explore a variant of simplet counting (specifically, estimating the relative counts of simplets) under realistic scenarios where the entire simplicial complex is not provided at once but can only be partially accessed, for instance, through a limited number of API calls. For such scenarios, we propose a random-walk-based sampling algorithm, SCRW, and analyze its theoretical properties. In our experiments, SCRW requires, on average, 16.5× less memory than SC3, while the speed-accuracy trade-offs provided by the two methods are comparable.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have