TriSig: Evaluating the statistical significance of triclusters

Rui Henriques,Rafael S. Costa,Leonardo Alexandre

doi:10.1016/j.patcog.2023.110231

Abstract

Tensor data analysis allows researchers to uncover novel patterns and relationships that cannot be obtained from tabular data alone. The information inferred from multi-way patterns can offer valuable insights into disease progression, bioproduction processes, behavioral responses, weather fluctuations, or social dynamics. However, spurious patterns often hamper this process. This work aims at proposing a statistical frame to assess the probability of patterns in tensor data to deviate from null expectations, extending well-established principles for assessing the statistical significance of patterns in tabular data. A principled discussion on binomial testing to mitigate false positive discoveries is entailed at the light of: variable dependencies, temporal associations and misalignments, and multi-hypothesis correction. Results gathered from the application of triclustering algorithms over distinct real-world case studies in biotechnological domains confer validity to the proposed statistical frame while revealing vulnerabilities of reference triclustering searches. The proposed assessment can be incorporated into existing triclustering algorithms to minimize spurious occurrences, rank patterns, and further prune the search space, reducing their computational complexity.

Full Text