Mining Order-Preserving Submatrices Under Data Uncertainty: A Possible-World Approach

Ji Cheng,Wilfred Ng,Xiaotian Hao,Da Yan

doi:10.1109/icde.2019.00106

Abstract

Given a data matrix D, a submatrix S of D is an order-preserving submatrix (OPSM) if there is a permutation of the columns of S, under which the entry values of each row in S are strictly increasing. OPSM mining is widely used in real-life applications such as identifying coexpressed genes, and finding customers with similar preference. However, noise is ubiquitous in real data matrices due to variable experimental conditions and measurement errors, which makes conventional OPSM mining algorithms inapplicable. No previous work has ever combated uncertain value intervals using the possible world semantics. We establish two different definitions of significant OPSMs based on the possible world semantics: (1) expected support based and (2) probabilistic frequentness based. An optimized dynamic programming approach is proposed to compute the probability that a row supports a particular column permutation, and several effective pruning rules are introduced to efficiently prune insignificant OPSMs. These techniques are integrated into our two OPSM mining algorithms, based on prefix-projection and Apriori respectively. Extensive experiments on real microarray data demonstrate that the OPSMs found by our algorithms have a much higher quality than those found by existing approaches.

Full Text