Abstract

The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an $$n \times n$$ Gaussian random matrix. The first part of the paper addresses global maxima. For fixed k we identify the average and the joint distribution of the $$k \times k$$ submatrix having largest average value. As a dual result, we establish that the size of the largest square sub-matrix with average bigger than a fixed positive constant is, with high probability, equal to one of two consecutive integers that depend on the threshold and the matrix dimension n. The second part of the paper addresses local maxima. Specifically we consider submatrices with dominant row and column sums that arise as the local optima of iterative search procedures for large average submatrices. For fixed k, we identify the limiting average value and joint distribution of a $$k \times k$$ submatrix conditioned to be a local maxima. In order to understand the density of such local optima and explain the quick convergence of such iterative procedures, we analyze the number $$L_n(k)$$ of local maxima, beginning with exact asymptotic expressions for the mean and fluctuation behavior of $$L_n(k)$$ . For fixed k, the mean of $$L_{n}(k)$$ is $$\Theta (n^{k}/(\log {n})^{(k-1)/2})$$ while the standard deviation is $$\Theta (n^{2k^2/(k+1)}/(\log {n})^{k^2/(k+1)})$$ . Our principal result is a Gaussian central limit theorem for $$L_n(k)$$ that is based on a new variant of Stein’s method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call