In the recent days, scene understanding has become hot research topic due to its real usage at perceiving, analyzing and recognizing different dynamic scenes coverage during GPS monitoring system, drone’s targets, auto-driving and tourist guide. The goal of scene understanding is to make machines look at like humans do, which means the accurate recognition of the contents in scenes and during location observations. Then, we perform two operations such as (1) to perfectly describe the whole environment and (2) to describe what action is going on in the environment. Due to complex scene analysis, recognition of multiple objects and the relation between the objects remain as a challenging part of the research. In this paper, we have proposed a novel approach for the scene understanding that integrates multiple objects detection/segmentation and scene labeling using Geometric features, Histogram of oriented gradient and scale invariant feature transform descriptors. The complete procedure of the purposed model includes resizing and noise removing of images from the dataset, multiple object segmentation and detection, feature extraction and multiple object recognition using multi-layer kernel sliding perceptron. After that, scene recognition is achieved by using multi-class logistic regression. Finally, two datasets such as MSRC and UIUC sports are used for the experimental evaluation of our proposed method. Our purposed method accurately handles the complex objects physical exclusion and objects occlusion. Therefore, it outperforms in term of accuracy compared with other state-of-the-art approaches.