Community outdoor public spaces are indispensable to urban residents’ daily lives. Analyzing community outdoor public spaces from a behavioral perspective is crucial and an effective way to support human-centered development in urban areas. Traditional behavioral analysis often relies on manually collected behavioral data, which is time-consuming, labor-intensive, and lacks data breadth. With the use of sensors, the breadth of behavioral data has greatly increased, but its accuracy is still insufficient, especially in the fine-grained differentiation of populations and behaviors. Computer vision is more efficient in distinguishing populations and recognizing behaviors. However, most existing computer vision applications face some challenges. For example, behavior recognition is limited to pedestrian trajectory recognition, and there are few that recognize the diverse behaviors of crowds. In view of these gaps, this paper proposes a more efficient approach that employs computer vision tools to examine different populations and different behaviors, obtain important statistical measures of spatial behavior, taking the Bajiao Cultural Square in Beijing as a test bed. This population and behavior recognition model presents several improvement strategies: Firstly, by leveraging an attention mechanism, which emulates the human selective cognitive mechanism, it is capable of accentuating pertinent information while disregarding extraneous data, and the ResNet backbone network can be refined by integrating channel attention. This enables the amplification of critical feature channels or the suppression of irrelevant feature channels, thereby enhancing the efficacy of population and behavior recognition. Secondly, it uses public datasets and self-made data to construct the dataset required by this model to improve the robustness of the detection model in specific scenarios. This model can distinguish five types of people and six kinds of behaviors, with an identification accuracy of 83%, achieving fine-grained behavior detection for different populations. To a certain extent, it solves the problem that traditional data face of large-scale behavioral data being difficult to refine. The population and behavior recognition model was adapted and applied in conjunction with spatial typology analysis, and we can conclude that different crowds have different behavioral preferences. There is inconsistency in the use of space by different crowds, there is inconsistency between behavioral and spatial function, and behavior is concentrated over time. This provides more comprehensive and reliable decision support for fine-grained planning and design.