Improving the explainability of Computer Vision models based on Deep Learning has recently become a compelling problem, ensuring reliable predictions to the end-user and enabling more fine-grained classifications. Recently, Concept Bottleneck models have been proposed for images classification, partitioning the problem in two stages and thereby defining a hierarchy of concepts. So far, however, little work has been done to investigate the applicability of this approach to other datasets with higher intra-class variability and ambiguity, and to discuss its flexibility to tasks different from whole-images classification. In this work we develop and discuss a Concept Bottleneck model for images segmentation, objects fine classification and tracking, and compare it to more classical methods based on Mask R-CNN and images similarity algorithms. All our models are trained and tested on a dataset comprised of pictures of fridges filled with various objects, however the method can be applied to any fine classification task. The proposed model makes full use of the hierarchy in concepts, exploiting the relationships between different categories at the same hierarchical level and relying on a novel method for handling multi-labels classifications. We show that the performance on fine classification is on par with a regular Mask R-CNN, but with a significant increase in explainability and in handling classes confusion. New explainable metrics are proposed to quantitatively evaluate the increase in explainability. We also demonstrate the effectiveness of the derived Concept Bottleneck features on related tasks, i.e., the tracking of objects between consecutive pictures in a sequence. The code is released as open source and available at https://opensource.silicon-austria.com/pittinof/hierarchical-concept-bottleneck.