In this paper, we introduce a probabilistic approach for extracting the complex hierarchical object structures from the digital images used by various vision applications. The proposed framework extends conventional marked point process (MPP) models by: 1) admitting object-subobject ensembles in parent-child relationships; and 2) allowing corresponding objects to form coherent object groups, by a Bayesian segmentation of the population. Different from earlier, highly domain specific attempts on MPP generalization, the proposed model is defined at an abstract level, providing clear interfaces for applications in various domains. We also introduce a global optimization process for the multi-layer framework for finding optimal entity configurations, considering the observed data, prior knowledge, and interactions between the neighboring and the hierarchically related objects. The proposed method is demonstrated in three different application areas: built in area analysis in remotely sensed images, traffic monitoring on airborne, and mobile laser scanning (Lidar) data and optical circuit inspection. A new benchmark database is published for the three test cases, and the model's performance is quantitatively evaluated.