The cluster patterns of features in map space represent a comprehensive reflection of individual feature geometric attributes and their spatial adjacency relationships. These patterns also embody spatial cognition results under the Gestalt principle. Describing non-linear spatial cluster patterns as effective regular structures is one of the fundamental tasks in deep learning for recognizing feature cluster patterns. In this study, based on the concept of texture co-occurrence matrices from regular gray-scale images, we utilized Voronoi diagrams to construct the tessellation structure of building polygons. Built upon the foundation of first-order texton co-occurrence matrices, we established three-dimensional texton co-occurrence matrices for building polygons, considered five attributes of building size, shape, orientation, and density, and encompassed 64 different combinations of second-order neighboring directions. This matrix concretizes the latent Gestalt spatial characteristics of building polygon clusters into a three-dimensional sparse matrix. It is then used as an input vector to construct a deep convolutional neural network for recognizing building polygon cluster patterns. Through adjustments and optimizations of neural network structure and strategies, along with validation through practical case studies and comparisons with other models, we have demonstrated the effectiveness of the second-order texton co-occurrence matrix in describing the characteristics of building polygon clusters.