This article proposes a 35D form index system to quantitatively describe the 3D form of urban blocks. Utilizing the T-distributed stochastic neighbor (TSNE) embedding algorithm for cluster analysis, the visually complex and disordered urban 3D texture is translated into distinct form clusters, enabling the recognition of the overall urban form structure from the block perspective. The research methodology includes experiments conducted in the central area of Nanjing and comparative analysis in three neighboring cities: Shanghai, Hangzhou, and Suzhou. Results demonstrate the efficacy of form parameters and cluster analysis in achieving sound recognition. The four cities differ remarkably in the number and distribution structure of clusters. Shanghai has the fewest types of clusters with a compact distribution, Suzhou has the most types with a dispersed distribution, and Hangzhou and Nanjing exhibit similar characteristics, located between Shanghai and Suzhou. Correlation analysis reveals a negative relationship between the number of cluster types and the level of urban socioeconomic development in similar areas. This research implies that governments and urban planners can exploit neighborhood morphological types to devise customized spatial management and renewal strategies. The overall urban structure can be improved by strategically minimizing the quantity and distribution of neighborhood morphological types, fostering socioeconomic development.