Abstract

In this paper, we prove a CLT for the sample canonical correlation coefficients between two high-dimensional random vectors with finite rank correlations. More precisely, consider two random vectors x˜=x+Az and y˜=y+Bz, where x∈Rp, y∈Rq and z∈Rr are independent random vectors with i.i.d. entries of mean zero and variance one, and A∈Rp×r and B∈Rq×r are two arbitrary deterministic matrices. Given n samples of x˜ and y˜, we stack them into two matrices X=X+AZ and Y=Y+BZ, where X∈Rp×n, Y∈Rq×n and Z∈Rr×n are random matrices with i.i.d. entries of mean zero and variance one. Let λ˜1≥λ˜2≥⋯≥λ˜r be the largest r eigenvalues of the sample canonical correlation (SCC) matrix CXY=(XX⊤)−1∕2XY⊤(YY⊤)−1YX⊤(XX⊤)−1∕2, and let t1≥t2≥⋯≥tr be the squares of the population canonical correlation coefficients between x˜ and y˜. Under certain moment assumptions, we show that there exists a threshold tc∈(0,1) such that if ti>tc, then n(λ˜ i−θi) converges weakly to a centered normal distribution, where θi is a fixed outlier location determined by ti. Our proof uses a self-adjoint linearization of the SCC matrix and a sharp local law on the inverse of the linearized matrix.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call