In this paper we propose and consider different metric approaches to image comparison based on Morpho-Semantic (MS) and Semantic-Morphological (SM) models. The first proposed class-based approach presumes the embedding of MS and SM models to the metric space with weighted Lp metrics. This approach is based on representation of SM models as mosaic vector functions composed of semantic-morphological class expression maps. The feature description of these maps provides a global feature description of SM models by SM vectors. The second proposed class-based approach is based on resource models, which include semantic-morphological class expression maps with area recourse values. This approach implements the embedding of these mosaic class expression maps with area recourse values to the metric space with Earth Mover’s Distance (EMD) based on resource transportation between these maps. Finally, we propose the object-based approach to metric embedding of SM models inspired by Geometrical Difference Distance (GDD), which performs the comparison of mosaic image shapes via weighted pairwise comparison of their region shapes. In this way we obtain the SM Difference Distance (SMDD) and its EMD-version (SMDD). The practical applicability of proposed SM-metrics is largely determined by the strategy of feature set forming and parameter estimation scheme. The SM-metrics parameter tuning for comparison of some visual scenes/objects could be performed both as MS-modeling (interpretation) of human subjective reasoning and as MS-modeling (interpretation) of deep learning results. In both cases, SM models and SM metrics fitting could allow: making partially transparent the human or DNN reasoning in scene comparison tasks; Comparing (grouping, clustering) different experts (algorithms) in terms of different parameters settings for SM-models; performing the personalized post-training of neural network models with taking into account the individual SM-settings of concrete users, operators or experts. This will combine the effectiveness of deep learning on huge training bases with partial transparency of reasoning and the possibility of directly taking into account the wishes of users in terms of SM-models, rather than by creating the artificial training bases via artificial augmentation.