Abstract

We propose the Jaccard matrix (JM) and the Jaccard cell (JC), define them as the extended concepts of the Jaccard index, and theoretically and numerically analyze them. The data on the Euclidean plane can derive the JM as a sparse matrix. We show the JC inherits the feature of similarity of the Jaccard index as the exponential function of mutual information. We theoretically and numerically confirm that the local correlation coefficient of the data on the Euclidean plane relates the JC to the mutual information. Although one could potentially select an arbitrary cell size of the grid to make the JM, the knowledge we can obtain from the matrix decreases if the cell size is too big or too small to distinguish the data clusters appropriately. Therefore, the JM needs a computational procedure to determine the cell size within the appropriate scale. Maximizing the variance of the JCs supports determining the unique cell size, which value locates in the middle range of the parabolic function of the cell-size parameter. The JM could derive an index extracting nonlinear correlation of the data. The maximized standard deviation of the JCs as such an index is a decreasing function of the noise scale of the data under the constraint conditions. The ability to determine the homogeneous rectangular grid pattern of the JM might be a significant feature for finding nonlinear correlation. We would summarize this study as that of a nonlinear filter working as an efficient component of explainable AI and statistics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call